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1 Rates of Change 



1.1 Change in 
discrete steps 

Toward the end of the eighteenth 
century, a German elementary 
school teacher decided to keep his 
pupils busy by assigning them a 
long, boring arithmetic problem: 
to add up all the numbers from 
one to a hundred. 1 The chil- 
dren set to work on their slates, 
and the teacher lit his pipe, con- 
fident of a long break. But al- 
most immediately, a boy named 
Carl Friedrich Gauss brought up 
his answer: 5,050. 







































































































a / Adding the numbers 
from 1 to 7. 



b / A trick for finding the 
sum. 

ing the area of the shaded region. 
Roughly half the square is shaded 
in, so if we want only an approxi- 
mate solution, we can simply cal- 
culate 7 2 /2 = 24.5. 

But, as suggested in figure b, it's 
not much more work to get an ex- 
act result. There are seven saw- 
teeth sticking out out above the di- 
agonal, with a total area of 7/2, 
so the total shaded area is (7 2 + 
7)/2 = 28. In general, the sum of 
the first n numbers will be (n 2 + 
n)/2, which explains Gauss's re- 
sult: (100 2 + 100)/2 = 5,050. 



Figure a suggests one way of solv- 
ing this type of problem. The 
filled-in columns of the graph rep- 
resent the numbers from 1 to 7, 
and adding them up means find- 



I'm giving my own retelling of a 
hoary legend. We don't really know the 
exact problem, just that it was supposed 
to have been something of this flavor. 



Two sides of the same coin 

Problems like this come up fre- 
quently. Imagine that each house- 
hold in a certain small town sends 
a total of one ton of garbage to the 
dump every year. Over time, the 
garbage accumulates in the dump, 
taking up more and more space. 
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c / Carl Friedrich Gauss 
(1777-1855), along time 
after graduating from ele- 
mentary school. 

Let's label the years as n = 1, 2, 
3, . . ., and let the function 2 x(n) 
represent the amount of garbage 
that has accumulated by the end 
of year n. If the population is 
constant, say 13 households, then 
garbage accumulates at a constant 
rate, and we have x(n) = 13n. 

But maybe the town's population 
is growing. If the population starts 
out as 1 household in year 1, and 
then grows to 2 in year 2, and so 
on, then we have the same kind 
of problem that the young Gauss 
solved. After 100 years, the accu- 
mulated amount of garbage will be 
5,050 tons. The pile of refuse grows 
more quickly every year; the rate of 
change of x is not constant. Tabu- 
lating the examples we've done so 
far, we have this: 



2 Recall that when a: is a function, the 
notation x(n) means the output of the 
function when the input is n. It doesn't 
represent multiplication of a number x by 
a number n. 



rate of change accumulated 

result 
13 13n 

n (n 2 + n)/2 

The rate of change of the function 
x can be notated as x. Given the 
function i, we can always deter- 
mine the function x for any value 
of n by doing a running sum. 

Likewise, if we know x, we can de- 
termine x by subtraction. In the 
example where x = 13n, we can 
find x = x(n) — x(n — 1) = 13ra — 
13(n — 1) = 13. Or if we knew 
that the accumulated amount of 
garbage was given by (n 2 + n)/2, 
we could calculate the town's pop- 
ulation like this: 



+ n (n - l) 2 + (n - 1) 



2 
■ 2n 



1) 




d / x is the slope of x. 



The graphical interpretation of 
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this is shown in figure d: on a 
graph of x = (n 2 + n)/2, the slope 
of the line connecting two succes- 
sive points is the value of the func- 
tion x. 

In other words, the functions x and 
x are like different sides of the same 
coin. If you know one, you can find 
the other — with two caveats. 

First, we've been assuming im- 
plicitly that the function x starts 
out at x(0) = 0. That might 
not be true in general. For in- 
stance, if we're adding water to a 
reservoir over a certain period of 
time, the reservoir probably didn't 
start out completely empty. Thus, 
if we know x, we can't find out 
everything about x without some 
further information: the starting 
value of x. If someone tells you 
x = 13, you can't conclude x = 
Yin, but only x = Yin + c, where c 
is some constant. There's no such 
ambiguity if you're going the op- 
posite way, from x to x. Even 
if x(0) / 0, we still have x = 
13n + c- [13(n- 1) + c] = 13. 

Second, it may be difficult, or even 
impossible, to find a formula for 
the answer when we want to de- 
termine the running sum x given 
a formula for the rate of change x. 
Gauss had a flash of insight that 
led him to the result (n 2 + n)/2, 
but in general we might only be 
able to use a computer spreadsheet 
to calculate a number for the run- 
ning sum, rather than an equation 
that would be valid for all values 



of n. 

Some guesses 

Even though we lack Gauss's ge- 
nius, we can recognize certain pat- 
terns. One pattern is that if a; is a 
function that gets bigger and big- 
ger, it seems like x will be a func- 
tion that grows even faster than 
x. In the example of x = n and 
x = (n 2 +n)/2, consider what hap- 
pens for a large value of n, like 
100. At this value of n, x = 100, 
which is pretty big, but even with- 
out pawing around for a calculator, 
we know that x is going to turn out 
really really big. Since n is large, 
n 2 is quite a bit bigger than n, so 
roughly speaking, we can approxi- 
mate x W n 2 /2 = 5,000. 100 may 
be a big number, but 5,000 is a lot 
bigger. Continuing in this way, for 
n = 1000 we have x = 1000, but 
x « 500, 000 — now x has far out- 
stripped x. This can be a fun game 
to play with a calculator: look at 
which functions grow the fastest. 
For instance, your calculator might 
have an x 2 button, an e x button, 
and a button for x\ (the factorial 
function, defined as x\ = 1-2-. . .-x, 
e.g., 4! = 1 ■ 2 • 3 ■ 4 = 24). You'll 
find that 50 2 is pretty big, but e 50 
is incomparably greater, and 50! is 
so big that it causes an error. 

All the x and x functions we've 
seen so far have been polynomials. 
If a; is a polynomial, then of course 
we can find a polynomial for x as 
well, because if a; is a polynomial, 



10 
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then x{n) — x{n—\) will be one too. 
It also looks like every polynomial 
we could choose for x might also 
correspond to an x that's a poly- 
nomial. And not only that, but it 
looks as though there's a pattern 
in the power of n. Suppose a; is a 
polynomial, and the highest power 
of n it contains is a certain num- 
ber — the "order" of the polyno- 
mial. Then x is a polynomial of 
that order minus one. Again, it's 
fairly easy to prove this going one 
way, passing from x to x, but more 
difficult to prove the opposite rela- 
tionship: that if x is a polynomial 
of a certain order, then x must be 
a polynomial with an order that's 
greater by one. 

We'd imagine, then, that the run- 
ning sum of x = n 2 would be a 
polynomial of order 3. If we cal- 
culate x(100) = I 2 + 2 2 + ... + 
100 2 on a computer spreadsheet, 
we get 338,350, which looks sus- 
piciously close to 1,000,000/3. It 
looks like x{n) = n 3 /3 + . . ., where 
the dots represent terms involving 
lower powers of n such as n 2 . The 
fact that the coefficient of the n 
term is 1/3 is proved in problem 
20 on p. 23. 



■* 



Example 1 
Figure e shows a pyramid consisting 
of a single cubical block on top, sup- 
ported by a 2 x 2 layer, supported in 
turn by a 3 x 3 layer. The total volume 
is 1 2 +2 2 + 3 2 , in units of the volume of 
a single block. 



i i r 



& 



Generalizing to the sum x(n) 



V 



e / A pyramid with a vol- 
ume of 1 2 +2 2 +3 2 . 

2 2 + . . . + n 2 , and applying the result of 
the preceding paragraph, we find that 
the volume of such a pyramid is ap- 
proximately (1/3)/\/7, where A = n 2 is 
the area of the base and h = n is the 
height. 

When n is very large, we can get as 
good an approximation as we like to 
a smooth-sided pyramid, and the er- 
ror incurred in x(n) « (1/3)n 3 + ... by 
omitting the lower-order terms . . . can 
be made as small as desired. 

We therefore conclude that the vol- 
ume is exactly (1 /3)Ah for a smooth- 
sided pyramid with these proportions. 

This is a special case of a theorem 
first proved by Euclid (propositions 
XII-6 and XII-7) two thousand years 
before calculus was invented. 

1.2 Continuous 
change 

Did you notice that I sneaked 
something past you in the example 
of water filling up a reservoir? The 
x and x functions I've been using 
as examples have all been functions 
defined on the integers, so they 
represent change that happens in 
discrete steps, but the flow of water 
into a reservoir is smooth and con- 
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f / Isaac 
1727) 



Newton (1643- 



tinuous. Or is it? Water is made 
out of molecules, after all. It's just 
that water molecules are so small 
that we don't notice them as indi- 
viduals. Figure g shows a graph 
that is discrete, but almost ap- 
pears continuous because the scale 
has been chosen so that the points 
blend together visually. 



alyzing x and x functions that were 
truly continuous. The notation x 
is due to him (and he only used it 
for continuous functions). Because 
he was dealing with the continuous 
flow of change, he called his new 
set of mathematical techniques the 
method of fluxions, but nowadays 
it's known as the calculus. 



2 iv 




h / The function x(t) = 
t 2 /2, and its tangent line 
at the point (1,1/2). 




g / On this scale, the 
graph of (n 2 + n)/2 ap- 
pears almost continuous. 



The physicist Isaac Newton started 
thinking along these lines in the 
1660's, and figured out ways of an- 



Newton was a physicist, and he 
needed to invent the calculus as 
part of his study of how objects 
move. If an object is moving in 
one dimension, we can specify its 
position with a variable x, and x 
will then be a function of time, t. 
The rate of change of its position, 
x, is its speed, or velocity. Ear- 
lier experiments by Galileo had es- 
tablished that when a ball rolled 
down a slope, its position was pro- 
portional to t 2 , so Newton inferred 
that a graph like figure h would 
be typical for any object moving 
under the influence of a constant 
force. (It could be It 2 , or t 2 /42, 
or anything else proportional to t 2 , 
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i / This line isn't a tangent 
line: it crosses the graph. 

depending on the force acting on 
the object and the object's mass.) 

Because the functions are continu- 
ous, not discrete, we can no longer 
define the relationship between x 
and x by saying a; is a running sum 
of i's, or that x is the difference be- 
tween two successive x's. But we 
already found a geometrical rela- 
tionship between the two functions 
in the discrete case, and that can 
serve as our definition for the con- 
tinuous case: x is the area under 
the graph of x, or, if you like, x is 
the slope of the tangent line on the 
graph of x. For now we'll concen- 
trate on the slope idea. 

The tangent line is defined as the 
line that passes through the graph 
at a certain point, but, unlike the 
one in figure i, doesn't cut across 
the graph. 3 By measuring with 
a ruler on figure h, we find that 
the slope is very close to 1, so evi- 
dently x(l) = 1. To prove this, we 
construct the function representing 



3 For a more formal definition, see 
page 135. 



the line: £(t) = t - 1/2. We want 
to prove that this line doesn't cross 
the graph of x(t) = t 2 /2. The dif- 
ference between the two functions, 
x — I, is the polynomial t 2 /2 — t + 
1/2, and this polynomial will be 
zero for any value of t where the 
line touches or crosses the curve. 
We can use the quadratic formula 
to find these points, and the result 
is that there is only one of them, 
which is t = 1. Since x — £ is posi- 
tive for at least some points to the 
left and right of t = 1, and it only 
equals zero at t = 1, it must never 
be negative, which means that the 
line always lies below the curve, 
never crossing it. 



A derivative 

That proves that x(l) = 1, but it 
was a lot of work, and we don't 
want to do that much work to eval- 
uate x at every value of t. There's 
a way to avoid all that, and find a 
formula for x. Compare figures h 
and j. They're both graphs of the 
same function, and they both look 
the same. What's different? The 
only difference is the scales: in fig- 
ure j, the t axis has been shrunk 
by a factor of 2, and the x axis by 
a factor of 4. The graph looks the 
same, because doubling t quadru- 
ples t 2 /2. The tangent line here 
is the tangent line at t = 2, not 
t = 1, and although it looks like 
the same line as the one in figure 
h, it isn't, because the scales are 
different. The line in figure h had 
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a slope of rise/run = 1/1 = 1, 
but this one's slope is 4/2 = 2. 
That means i(2) = 2. In general, 
this scaling argument shows that 
x{t) = t for any t. 




j / The function f/2 
again. How is this 
different from figure h? 



and our earlier result for discrete 
ones, 

x = (n + n)/2 x = n 

The similarity is no coincidence. 
A continuous function is just a 
smoothed-out version of a discrete 
one. For instance, the continuous 
version of the staircase function 
shown in figure b on page 7 would 
simply be a triangle without the 
saw teeth sticking out; the area of 
those ugly sawteeth is what's rep- 
resented by the n/2 term in the dis- 
crete result x = (n 2 + n)/2, which 
is the only thing that makes it dif- 
ferent from the continuous result 
x = t 2 /2. 



This is called differentiating: find- 
ing a formula for the function x, 
given a formula for the function 
x. The term comes from the idea 
that for a discrete function, the 
slope is the difference between two 
successive values of the function. 
The function x is referred to as the 
derivative of the function x, and 
the art of differentiating is differ- 
ential calculus. The opposite pro- 
cess, computing a formula for x 
when given x, is called integrating, 
and makes up the field of integral 
calculus; this terminology is based 
on the idea that computing a run- 
ning sum is like putting together 
(integrating) many little pieces. 

Note the similarity between this re- 
sult for continuous functions, 



t 2 /2 



t 



Properties of the derivative 

It follows immediately from the 
definition of the derivative that 
multiplying a function by a con- 
stant multiplies its derivative by 
the same constant, so for example 
since we know that the derivative 
of i 2 /2 is t, we can immediately tell 
that the derivative of t 2 is 2i, and 
the derivative of t 2 /17 is 2t/17. 

Also, if we add two functions, their 
derivatives add. To give a good 
example of this, we need to have 
another function that we can dif- 
ferentiate, one that isn't just some 
multiple of t 2 . An easy one is t: the 
derivative of t is 1, since the graph 
of x = t is a line with a slope of 1, 
and the tangent line lies right on 
top of the original line. 
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The derivative of a constant is 
zero, since a constant function's 
graph is a horizontal line, with 
a slope of zero. We now know 
enough to differentiate a second- 
order polynomial. 

Example 2 
The derivative of 5f 2 + 2f is the deriva- 
tive of 5t 2 plus the derivative of 2f, 
since derivatives add. The derivative 
of 5t 2 is 5 times the derivative of t 2 , 
and the derivative of 2f is 2 times the 
derivative of t, so putting everything 
together, we find that the derivative of 
5f 2 + 2fis(5)(2f) + (2)(1) = 10f + 2. 

Example 3 
t> An insect pest from the United 
States is inadvertently released in a 
village in rural China. The pests 
spread outward at a rate of s kilome- 
ters per year, forming a widening cir- 
cle of contagion. Find the number of 
square kilometers per year that be- 
come newly infested. Check that the 
units of the result make sense. Inter- 
pret the result. 

t> Let t be the time, in years, since 
the pest was introduced. The radius 
of the circle is r = st, and its area is 
a = nr 2 = n(stf. To make this look 
like a polynomial, we have to rewrite 
this as a = (ns?)f. The derivative is 

a = (ns 2 )(2t) 

a = {2ns 2 )t 

The units of s are km/year, so squar- 
ing it gives km 2 /year 2 . The 2 and the 
7t are unitless, and multiplying by t 
gives units of km 2 /year, which is what 
we expect for a, since it represents the 
number of square kilometers per year 
that become infested. 



Interpreting the result, we notice a 
couple of things. First, the rate of 
infestation isn't constant; it's propor- 
tional to f, so people might not pay 
so much attention at first, but later on 
the effort required to combat the prob- 
lem will grow more and more quickly. 
Second, we notice that the result is 
proportional to s 2 . This suggests that 
anything that could be done to reduce 
s would be very helpful. For instance, 
a measure that cut s in half would re- 
duce a by a factor of four. 



Higher-order polynomials 

So far, we have the following re- 
sults for polynomials up to order 
2: 

function derivative 

1 

t 1 

t 2 It 

Interpreting 1 as t , we detect what 
seems to be a general rule, which 
is that the derivative of t k is fci fc_1 . 
The proof is straightforward but 
not very illuminating if carried out 
with the methods developed in this 
chapter, so I've relegated it to page 
136. It can be proved much more 
easily using the methods of chapter 
2. 

I 



Example 4 

> If x = 2t 7 - At + 1 , find x. 

> This is similar to example 2, the only 
difference being that we can now han- 
dle higher powers of t. The derivative 
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of f is 7t 6 , so we have 

x = (2)(7f 6 ) + (-4)(1) + 
= 14f 6 + _4 



Example 5 

> Calculate 3" 1 and 3.01 " 1 . Does 

this seem consistent with a conjecture 

that the rule for differentiating t" holds 

for k < 0? 

> We have 3" 1 « 0.33333 and 
3.01 " 1 « 0.332223, the difference be- 
ing -1.1 x 10~ 3 . This suggests that 
the graph of x = 1 /f has a tangent line 
at t = 3 with a slope of about 



-1.1 x 10" 
O01 



= -0.11 



If the rule for differentiating t k were to 
hold, then we would have x = -t~ 2 , 
and evaluating this at x = 3 would give 
-1/9, which is indeed about —0.11. 
Yes, the rule does appear to hold for 
negative k, although this numerical 
check does not constitute a proof. A 
proof is given in example 10 on p. 27. 

The second derivative 

I described how Galileo and New- 
ton found that an object subject 
to an external force, starting from 
rest, would have a velocity x that 
was proportional to t, and a posi- 
tion x that varied like t 2 . The pro- 
portionality constant for the veloc- 
ity is called the acceleration, a, so 
that x = at and x = at 2 /2. For 
example, a sports car accelerating 
from a stop sign would have a large 
acceleration, and its velocity at at 



a given time would therefore be 
a large number. The acceleration 
can be thought of as the deriva- 
tive of the derivative of x, writ- 
ten x, with two dots. In our ex- 
ample, x = a. In general, the ac- 
celeration doesn't need to be con- 
stant. For example, the sports car 
will eventually have to stop accel- 
erating, perhaps because the back- 
ward force of air friction becomes 
as great as the force pushing it for- 
ward. The total force acting on the 
car would then be zero, and the car 
would continue in motion at a con- 
stant speed. 

Example 6 
Suppose the pilot of a blimp has just 
turned on the motor that runs its pro- 
peller, and the propeller is spinning 
up. The resulting force on the blimp 
is therefore increasing steadily, and 
let's say that this causes the blimp to 
have an acceleration x = 3f, which in- 
creases steadily with time. We want 
to find the blimp's velocity and position 
as functions of time. 

For the velocity, we need a polynomial 
whose derivative is 3t. We know that 
the derivative of t 2 is 2f, so we need to 
use a function that's bigger by a factor 
of 3/2: x = (3/2)f 2 . In fact, we could 
add any constant to this, and make it 
x = (3/2)f 2 + 14, for example, where 
the 14 would represent the blimp's 
initial velocity. But since the blimp 
has been sitting dead in the air un- 
til the motor started working, we can 
assume the initial velocity was zero. 
Remember, any time you're working 
backwards like this to find a function 
whose derivative is some other func- 
tion (integrating, in other words), there 
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is the possibility of adding on a con- 
stant like this. 

Finally, for the position, we need 
something whose derivative is (3/2)f 2 . 
The derivative of f 3 would be 3f 2 , so 
we need something half as big as this: 
x = ?/2. 

The second derivative can be in- 





k/The 
and It 2 . 



functions 2t, t 



I / The functions t 2 and 
3-f 2 . 

less than zero, is concave down. 
Another way of saying it is that if 
you're driving along a road shaped 
like t 2 , going in the direction of in- 
creasing t, then your steering wheel 
is turned to the left, whereas on a 
road shaped like 3 — t 2 it's turned 
to the right. 



terpreted as a measure of the cur- 
vature of the graph, as shown in 
figure k. The graph of the function 
x = 2t is a line, with no curvature. 
Its first derivative is 2, and its sec- 
ond derivative is zero. The func- 
tion t 2 has a second derivative of 2, 
and the more tightly curved func- 
tion It 2 has a bigger second deriva- 
tive, 14. 



Positive and negative signs of the 
second derivative indicate concav- 
ity. In figure 1, the function t 2 is 
like a cup with its mouth pointing 
up. We say that it's "concave up," 
and this corresponds to its posi- 
tive second derivative. The func- 
tion 3 — t 2 . with a second derivative 




m / The functions t 3 has 
an inflection point at t = 
0. 



Figure m shows a third possibility. 
The function t has a derivative 
3t 2 , which equals zero at t = 0. 
This called a point of inflection. 
The concavity of the graph is down 
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on the left, up on the right. The 
inflection point is where it switches 
from one concavity to the other. In 
the alternative description in terms 
of the steering wheel, the inflection 
point is where your steering wheel 
is crossing from left to right. 

1.3 Applications 

Maxima and minima 

When a function goes up and then 
smoothly turns around and comes 
back down again, it has zero slope 
at the top. A place where x = 0, 
then, could represent a place where 
x was at a maximum. On the other 
hand, it could be concave up, in 
which case we'd have a minimum. 



Example 7 

> Fred receives a mysterious e-mail tip 
telling him that his investment in a cer- 
tain stock will have a value given by 
x = -2f + (6.4577 x 10 10 )f, where 
t > 2005 is the year. Should he sell at 
some point? If so, when? 

> If the value reaches a maximum at 
some time, then the derivative should 
be zero then. Taking the derivative 
and setting it equal to zero, we have 



Should Fred sell on New Year's eve of 
2006? 

But this could be a maximum, a mini- 
mum, or an inflection point. Fred defi- 
nitely does not want to sell at t = 2006 
if it's a minimum! To check which of 
the three possibilities hold, Fred takes 
the second derivative: 

x = -24f 2 

Plugging in t = 2006.0, we find that 
the second derivative is negative at 
that time, so it is indeed a maximum. 

Implicit in this whole discussion 
was the assumption that the maxi- 
mum or minimum where the func- 
tion was smooth. There are some 
other possibilities. 

In figure n, the function's mini- 
mum occurs at an end-point of its 
domain. 




= -8t 3 + 6.4577 x 10 10 



t = 



6.4577 x 10 1 



10\ V3 



t = ±2006.0 

Obviously the solution at t = -2006.0 
is bogus, since the stock market didn't 
exist four thousand years ago, and the 
tip only claimed the function would be 
valid for t > 2005. 



n / The function x = \ft 
has a minimum at t = 
0, which is not a place 
where x = 0. This point is 
the edge of the function's 
domain. 

Another possibility is that the 
function can have a minimum or 
maximum at some point where 
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side this region either the width or the 
height of the rectangle would be neg- 
ative. The function a(t) could there- 
fore have a maximum either at a place 
where a = 0, or at the endpoints of the 
function's domain. We can eliminate 
the latter possibility, because the area 
is zero at the endpoints. 

To evaluate the derivative, we first 
need to reexpress a as a polynomial: 



o /The function x = \t\ 


a = -t 2 ^t 


has a minimum at t = 


0, which is not a place 


2 


where x = 0. This is a 


The derivative is 


point where the function 


L 


isn't differentiable. 


a = -2t+- 



its derivative isn't well defined. 
Figure o shows such a situation. 
There is a kink in the function at 
t = 0, so a wide variety of lines 
could be placed through the graph 
there, all with different slopes and 
all staying on one side of the graph. 
There is no uniquely defined tan- 
gent line, so the derivative is unde- 
fined. 

Example 8 

> Rancher Rick has a length of cy- 
clone fence L with which to enclose a 
rectangular pasture. Show that he can 
enclose the greatest possible area by 
forming a square with sides of length 
L/4. 

> If the width and length of the rect- 
angle are t and u, and Rick is go- 
ing to use up all his fencing material, 
then the perimeter of the rectangle, 
2t + 2u, equals /., so for a given width, 
t, the length is u = L/2 - t. The area 
is a = tu = t{L/2 - t). The func- 
tion only means anything realistic for 
< t < L/2, since for values of t out- 



Setting this equal to zero, we find t = 
L/4, as claimed. This is a maximum, 
not a minimum or an inflection point, 
because the second derivative is the 
constant a = -2, which is negative for 
all f, including t = L/4. 

Propagation of errors 

The Women's National Basketball 
Association says that balls used in 
its games should have a radius of 
11.6 cm, with an allowable range of 
error of plus or minus 0.1 cm (one 
millimeter). How accurately can 
we determine the ball's volume? 



The equation for the volume of 
a sphere gives V = (4/3)7rr 3 = 
6538 cm 3 (about six and a half 
liters). We have a function V(r), 
and we want to know how much 
of an effect will be produced on 
the function's output V if its in- 
put r is changed by a certain small 
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11.6±.l cm 

p / How accurately can we determine 
the ball's volume? 

amount. Since the amount by 
which r can be changed is small 
compared to r, it's reasonable to 
take the tangent line as an ap- 
proximation to the actual graph. 
The slope of the tangent line is 
the derivative of V , which is Airr 2 . 
(This is the ball's surface area.) 
Setting (slope) = (rise) /(run) and 
solving for the rise, which repre- 
sents the change in V, we find 
that it could be off by as much as 
(47rr 2 )(0.1 cm) = 170 cm 3 . The 
volume of the ball can therefore be 
expressed as 6500 ±170 cm 3 , where 
the original figure of 6538 has been 
rounded off to the nearest hundred 
in order to avoid creating the im- 
pression that the 3 and the 8 actu- 
ally mean anything — they clearly 
don't, since the possible error is 
out in the hundreds' place. 

This calculation is an example of a 



very common situation that occurs 
in the sciences, and even in every- 
day life, in which we base a calcu- 
lation on a number that has some 
range of uncertainty in it, causing a 
corresponding range of uncertainty 
in the final result. This is called 
propagation of errors. The idea is 
that the derivative expresses how 
sensitive the function's output is to 
its input. 

The example of the basketball 
could also have been handled with- 
out calculus, simply by recalculat- 
ing the volume using a radius that 
was raised from 11.6 to 11.7 cm, 
and finding the difference between 
the two volumes. Understanding it 
in terms of calculus, however, gives 
us a different way of getting at the 
same ideas, and often allows us to 
understand more deeply what's go- 
ing on. For example, we noticed in 
passing that the derivative of the 
volume was simply the surface area 
of the ball, which provides a nice 
geometric visualization. We can 
imagine inflating the ball so that 
its radius is increased by a millime- 
ter. The amount of added volume 
equals the surface area of the ball 
multiplied by one millimeter, just 
as the amount of volume added to 
the world's oceans by global warm- 
ing equals the oceans' surface area 
multiplied by the added depth. 

For an example of an insight 
that we would have missed if we 
hadn't applied calculus, consider 
how much error is incurred in the 
measurement of the width of a 
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book if the ruler is placed on the 
book at a slightly incorrect angle, 
so that it doesn't form an angle 
of exactly 90 degrees with spine. 
The measurement has its minimum 
(and correct) value if the ruler is 
placed at exactly 90 degrees. Since 
the function has a minimum at 
this angle, its derivative is zero. 
That means that we expect essen- 
tially no error in the measurement 
if the ruler's angle is just a tiny 
bit off. This gives us the insight 
that it's not worth fiddling exces- 
sively over the angle in this mea- 
surement. Other sources of error 
will be more important. For exam- 
ple, is the book a uniform rectan- 
gle? Are we using the worn end of 
the ruler as its zero, rather than 
letting the ruler hang over both 
sides of the book and subtracting 
the two measurements? 
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Problems 

1 Graph the function t 2 in the 
neighborhood of t = 3, draw a tan- 
gent line, and use its slope to verify 
that the derivative equals 2£ at this 
point. > Solution, p. 162 

2 Graph the function sin e* in 
the neighborhood of t = 0, draw a 
tangent line, and use its slope to 
estimate the derivative. Answer: 
0.5403023058. (You will of course 
not get an answer this precise using 
this technique.) 

> Solution, p. 162 

3 Differentiate the follow- 
ing functions with respect to t: 
1, 7, t, It, t 2 , It 2 , t 3 , It 3 . 

t> Solution, p. 163 

4 Differentiate 3t 7 - 4i 2 + 6 with 

respect to t. > Solution, p. 163 

5 Differentiate at 2 + bt + c with 
respect to t. 

> Solution, p. 163 [Thompson, 1919] 

6 Find two different functions 
whose derivatives are the constant 
3, and give a geometrical interpre- 
tation. > Solution, p. 163 

7 Find a function x whose 
derivative is x = t 7 . In other 
words, integrate the given func- 
tion. > Solution, p. 164 

8 Find a function x whose 
derivative is x = 3i 7 . In other 
words, integrate the given func- 
tion. > Solution, p. 164 

9 Find a function x whose 
derivative is x = 3i 7 — At 2 + 6. 



In other words, integrate the given 

function. > Solution, p. 164 

10 Let t be the time that has 
elapsed since the Big Bang. In 
that time, one would imagine that 
light, traveling at speed c, has been 
able to travel a maximum distance 
ct. (In fact the distance is several 
times more than this, because ac- 
cording to Einstein's theory of gen- 
eral relativity, space itself has been 
expanding while the ray of light 
was in transit.) The portion of 
the universe that we can observe 
would then be a sphere of radius 
ct, with volume v = (4/3)7rr 3 = 
(4/3)7r(c£) 3 . Compute the rate v 
at which the observable universe is 
increasing, and check that your an- 
swer has the right units, as in ex- 
ample 3 on page 14. 

> Solution, p. 164 

11 Kinetic energy is a measure 
of an object's quantity of motion; 
when you buy gasoline, the energy 
you're paying for will be converted 
into the car's kinetic energy (actu- 
ally only some of it, since the en- 
gine isn't perfectly efficient). The 
kinetic energy of an object with 
mass m and velocity v is given by 
K = (l/2)mv 2 . For a car acceler- 
ating at a steady rate, with v = at, 
find the rate K at which the en- 
gine is required to put out kinetic 
energy. K, with units of energy 
over time, is known as the power. 
Check that your answer has the 
right units, as in example 3 on page 
14. > Solution, p. 164 
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12 A metal square expands 
and contracts with temperature, 
the lengths of its sides varying ac- 
cording to the equation £ = (1 + 
aT)£ . Find the rate of change 
of its surface area with respect to 
temperature. That is, find £, where 
the variable with respect to which 
you're differentiating is the tem- 
perature, T . Check that your an- 
swer has the right units, as in ex- 
ample 3 on page 14. 

I> Solution, p. 165 

13 Find the second derivative of 



2/" 



t. 



i> Solution, p. 165 



14 Locate any points of inflec- 
tion of the function t 3 + t 2 . Verify 
by graphing that the concavity of 
the function reverses itself at this 

point. > Solution, p. 165 

15 Let's see if the rule that the 
derivative of t k is kt k ~ l also works 
for k < 0. Use a graph to test one 
particular case, choosing one par- 
ticular negative value of k, and one 
particular value of t. If it works, 
what does that tell you about the 
rule? If it doesn't work? 

t> Solution, p. 165 

16 Two atoms will interact via 
electrical forces between their pro- 
tons and electrons. To put them 
at a distance r from one another 
(measured from nucleus to nu- 
cleus), a certain amount of energy 
E is required, and the minimum 
energy occurs when the atoms are 
in equilibrium, forming a molecule. 
Often a fairly good approximation 
to the energy is the Lennard-Jones 



expression 

E(r) = k 



ax 12 

r - 



where k and a are constants. Note 
that, as proved in chapter 2, the 
rule that the derivative of t is 
kt k_1 also works for k < 0. Show 
that there is an equilibrium at r = 
a. Verify (either by graphing or by 
testing the second derivative) that 
this is a minimum, not a maximum 
or a point of inflection. 

I> Solution, p. 167 

17 Prove that the total number 
of maxima and minima possessed 
by a third-order polynomial is at 
most two. I> Solution, p. 168 

18 Euclid proved that the vol- 
ume of a pyramid equals (l/3)bh, 
where b is the area of its base, 
and h its height. A pyramidal 
tent without tent-poles is erected 
by blowing air into it under pres- 
sure. The area of the base is easy 
to measure accurately, because the 
base is nailed down, but the height 
fluctuates somewhat and is hard to 
measure accurately. If the amount 
of uncertainty in the measured 
height is plus or minus eh, find the 
amount of possible error ey in the 
volume. [> Solution, p. 168 

19 A hobbyist is going to mea- 
sure the height to which her model 
rocket rises at the peak of its tra- 
jectory. She plans to take a digi- 
tal photo from far away and then 
do trigonometry to determine the 
height, given the baseline from the 
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launchpad to the camera and the 
angular height of the rocket as 
determined from analysis of the 
photo. Comment on the error in- 
curred by the inability to snap the 
photo at exactly the right moment. 

> Solution, p. 168 

20 Prove, as claimed on p. 10, 
that if the sum l 2 + 2 2 + . . . + n 2 
is a polynomial, it must be of third 
order, and the coefficient of the n 3 
term must be 1/3. 

> Solution, p. 169 
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2 To infinity 
beyond! 



and 




a / Gottfried 
(1646-1716) 



Leibniz 



Little kids readily pick up the idea 
of infinity. "When I grow up, 
I'm gonna have a million Barbies." 
"Oh yeah? Well, I'm gonna have 
a billion." "Well, I'm gonna have 
infinity Barbies." "So what? I'll 
have two infinity of them." Adults 
laugh, convinced that infinity, oo, 
is the biggest number, so 2oo can't 
be any bigger. This is the idea be- 
hind a joke in the movie Toy Story. 
Buzz Lightyear's slogan is "To in- 
finity — and beyond!" We assume 
there isn't any beyond. Infinity is 
supposed to be the biggest there 
is, so by definition there can't be 
anything bigger, right? 

2.1 Infinitesimals 

Actually mathematicians have in- 
vented many different logical sys- 



tems for working with infinity, and 
in most of them infinity does come 
in different sizes and flavors. New- 
ton, as well as the German mathe- 
matician Leibniz who invented cal- 
culus independently, 1 had a strong 
intuitive idea that calculus was re- 
ally about numbers that were in- 
finitely small: infinitesimals, the 
opposite of infinities. For instance, 
consider the number 1.1 = 1.21. 
That 2 in the first decimal place 
is the same 2 that appears in the 
expression 2t for the derivative of 
t 2 . 




0.6 0.8 1 1.2 1. 



b / A close-up view of the 
function x = t 2 , show- 
ing the line that con- 
nects the points (1,1) 
and (1.1, 1.21). 



1 There is some dispute over this point. 
Newton and his supporters claimed that 
Leibniz plagiarized Newton's ideas, and 
merely invented a new notation for them. 
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Figure b shows the idea visually. 
The line connecting the points 
(1,1) and (1.1,1.21) is almost in- 
distinguishable from the tangent 
line on this scale. Its slope is 
(1.21 - 1)/(1.1 - 1) = 2.1, which 
is very close to the tangent line's 
slope of 2. It was a good approx- 
imation because the points were 
close together, separated by only 
0.1 on the t axis. 

If we needed a better approxi- 
mation, we could try calculating 
1.01 2 = 1.0201. The slope of the 
line connecting the points (1,1) 
and (1.01, 1.0201) is 2.01, which is 
even closer to the slope of the tan- 
gent line. 

Another method of visualizing the 
idea is that we can interpret x = t 2 
as the area of a square with sides 
of length t, as suggested in fig- 
ure c. We increase t by an in- 
finitesimally small number dt. The 
d is Leibniz's notation for a very 
small difference, and dt is to be 
read is a single symbol, "dee-tee," 
not as a number d multiplied by 



dt 


tdt 


dt 2 


t 


t 2 


-a 

4-1 



t 



dt 



c / A geometrical inter- 
pretation of the derivative 
off 2 . 



a number t. The idea is that df 
is smaller than any ordinary num- 
ber you could imagine, but it's not 
zero. The area of the square is in- 
creased by da; = 2tdt + dt 2 , which 
is analogous to the finite numbers 
0.21 and 0.0201 we calculated ear- 
lier. Where before we divided by 
a finite change in t such as 0.1 or 
0.01, now we divide by dt, produc- 
ing 



dx 
d7 



It dt + dt 2 



2t 



dt 

-dt 



for the derivative. On a graph like 
figure b, dx/dt is the slope of the 
tangent line: the change in x di- 
vided by the changed in t. 

But adding an infinitesimal num- 
ber dt onto It doesn't really change 
it by any amount that's even the- 
oretically measurable in the real 
world, so the answer is really 2t. 
Evaluating it at t = 1 gives the 
exact result, 2, that the earlier 
approximate results, 2.1 and 2.01, 
were getting closer and closer to. 

Example 9 
To show the power of infinitesimals 
and the Leibniz notation, let's prove 
that the derivative of t 3 is 3f 2 : 



dx 


(fn 


■df) 3 


-f 3 




df 




df 








3f 2 


df + 


3fdf 2 


+ df 3 








df 






= 3f 2 


+ . . . 







where the dots indicate infinitesimal 
terms that we can neglect. 
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This result required significant 
sweat and ingenuity when proved 
on page 136 by the methods of 
chapter 1, and not only that 
but the old method would have 
required a completely different 
method of proof for a function that 
wasn't a polynomial, whereas the 
new one can be applied more gen- 
erally, as we'll see presently in ex- 
amples 10-13. 

It's easy to get the mistaken im- 
pression that infinitesimals exist 
in some remote fairyland where we 
can never touch them. This may 
be true in the same artsy-fartsy 
sense that we can never truly un- 
derstand v2, because its decimal 
expansion goes on forever, and 
we therefore can never compute 
it exactly. But in practical work, 
that doesn't stop us from working 
with V2. We just approximate it 
as, e.g., 1.41. Infinitesimals are no 
more or less mysterious than irra- 
tional numbers, and in particular 
we can represent them concretely 
on a computer. If you go to 
lightandmatter . com/calc/inf , 
you'll find a web-based calculator 
called Inf, which can handle 
infinite and infinitesimal numbers. 
It has a built-in symbol, d, which 
represents an infinitesimally small 
number such as the da;'s and di's 
we've been handling symbolically. 

Let's use Inf to verify that the 
derivative of £ 3 , evaluated at t = 1, 
is equal to 3, as found by plug- 
ging in to the result of example 9. 
The : symbol is the prompt that 



shows you Inf is ready to accept 
your typed input. 

: ((l+d)~3-l)/d 
3+3d+d ~2 

As claimed, the result is 3, or close 
enough to 3 that the infinitesimal 
error doesn't matter in real life. It 
might look like Inf did this exam- 
ple by using algebra to simplify the 
expression, but in fact Inf doesn't 
know anything about algebra. One 
way to see this is to use Inf to com- 
pare d with various real numbers: 

: d<l 

true 
: d<0.01 

true 
: d<0. 0000001 

true 
: d<0 

false 

If d were just a variable being 
treated according to the axioms of 
algebra, there would be no way to 
tell how it compared with other 
numbers without having some spe- 
cial information. Inf doesn't know 
algebra, but it does know that d 
is a positive number that is less 
than any positive real number that 
can be represented using decimals 
or scientific notation. 

Example 10 

In example 5 on p. 15, we made a 

rough numerical check to see if the 

differentiation rule t k —► kt k ~\ which 

was proved on p. 136 for k = 1, 2, 3, 
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..., was also valid for k - -1, i.e., 
for the function x = 1/f. Let's look 
for an actual proof. To find a natu- 
ral method of attack, let's first redo 
the numerical check in a slightly more 
suggestive form. Again approximating 
the derivating at t = 3, we have 



dx 

df 



1 
3ToT 



1 

ooT 



Let's apply the grade-school tech- 
nique for subtracting fractions, in 
which we first get them over the same 
denominator: 





1 
3 


1 
3T0T 


3- 

3x 


3.01 
3.01 


The 


result 


is 








dx 


.( " 


-0.01 


\( 1 




df 


'U 


x3.01 


) V0.01 








1 





3 x 3.01 

Replacing 3 with f and 0.01 with df, 
this becomes 



dx 

df 



1 



f(f + df) 
-r 2 + ... 



Example 11 
The derivative of x = sin t, with f in 
units of radians, is 

dx _ sin(f + df) - sin t 
df = df 

and with the trig identity sin(a + (3) = 
sinctcosp + cosctsin |3, this becomes 

_ sin f cos df + cos f sin df - sin f 

~ df 




d / Graphs of sin f, 
its derivative cos f. 



and 



Applying the small-angle approxima- 
tions sin u « u and cos u ^ 1 , we 
have 



dx 
df 



_ cosf df 
~ df 
= cos f + . . 



where ". . . " represents the error 
caused by the small-angle approxima- 
tions. 

This is essentially all there is to the 
computation of the derivative, except 
for the remaining technical point that 
we haven't proved that the small-angle 
approximations are good enough. In 
example 9 on page 26, when we cal- 
culated the derivative of f 3 , the result- 
ing expression for the quotient dx/df 
came out in a form in which we could 
inspect the ". . . " terms and verify be- 
fore discarding them that they were in- 
finitesimal. The issue is less trivial in 
the present example. This point is ad- 
dressed more rigorously on page 137. 

Figure d shows the graphs of the func- 
tion and its derivative. Note how the 
two graphs correspond. At f = 0, 
the slope of sin f is at its largest, and 
is positive; this is where the deriva- 
tive, cos f, attains its maximum posi- 
tive value of 1 . At f = n/2, sin f has 
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reached a maximum, and has a slope 
of zero; cos t is zero here. At t = n, 
in the middle of the graph, sin t has its 
maximum negative slope, and cos t is 
at its most negative extreme of -1 . 

Physically, sin t could represent the 
position of a pendulum as it moved 
back and forth from left to right, and 
cos t would then be the pendulum's 
velocity. 



Example 12 
What about the derivative of the co- 
sine? The cosine and the sine are re- 
ally the same function, shifted to the 
left or right by n/2. If the derivative 
of the sine is the same as itself, but 
shifted to the left by tt/2, then the 
derivative of the cosine must be a co- 
sine shifted to the left by tt/2: 



d cosf 
df 



= cos(f + 7t/2) 
= - sin t 



The next example will require a 
little trickery. By the end of this 
chapter you'll learn general tech- 
niques for cranking out any deriva- 
tive cookbook-style, without hav- 
ing to come up with any tricks. 



Example 13 

> Find the derivative of 1 /(1 - 1), eval- 
uated at t = 0. 

> The graph shows what the function 
looks like. It blows up to infinity at t = 
1 , but it's well behaved at t = 0, where 
it has a positive slope. 

For insight, let's calculate some points 
on the curve. The point at which 
we're differentiating is (0,1). If we 
put in a small, positive value of t, 




e / The function x 
1/(1 -t). 

we can observe how much the re- 
sult increases relative to 1, and this 
will give us an approximation to the 
derivative. For example, we find that 
at t = 0.001, the function has the 
value 1.001001001001, and so the 
derivative is approximately (1.001 - 
1)/(.001 - 0), or about 1. We can 
therefore conjecture that the deriva- 
tive is exactly 1, but that's not the 
same as proving it. 

But let's take another look at that num- 
ber 1 .001 001 001 001 . It's clearly a re- 
peating decimal. In other words, it ap- 
pears that 



1 



1 



1 



1 -1/1000 1000 V 1 ooo 

and we can easily verify this by mul- 
tiplying both sides of the equation by 
1-1/1 000 and collecting like powers. 
This is a special case of the geometric 
series 



1 



1 -t 



1+t + f 



which can be derived 2 by doing syn- 
thetic division (the equivalent of long 



2 As a technical aside, it's not neces- 
sary for our present purposes to go into 
the issue of how to make the most gen- 
eral possible definition of what is meant 
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division for polynomials), or simply 
verified, after forming the conjec- 
ture based on the numerical example 
above, by multiplying both sides by 
1 - t. 

As we'll see in section 2.2, and have 
been implicitly assuming so far, in- 
finitesimals obey all the same ele- 
mentary laws of algebra as the real 
numbers, so the above derivation also 
holds for an infinitesimal value of t. 
We can verify the result using Inf: 

: l/(l-d) 

l+d+d'2+d'3+d~4 

Notice, however, that the series is 
truncated after the first five terms. 
This is similar to the truncation that 
happens when you ask your calcula- 
tor to find y/2 as a decimal. 

The result for the derivative is 



dx 
df 



1 +df + df + . 



1 +df- 1 



1 +. 



2.2 Safe use of 
infinitesimals 

The idea of infinitesimally small 
numbers has always irked purists. 



by a sum like this one which has an infi- 
nite number of terms; the only fact we'll 
need here is that the error in finite sum 
obtained by leaving out the ". . . " has 
only higher powers of f. This is taken 
up in more detail in ch. 7. Note that 
the series only gives the right answer 
for t < 1. E.g., for t = 1 , it equals 
1 + 1 + 1 + . . ., which, if it means anything, 
clearly means something infinite. 




f / Bishop George Berke- 
ley (1685-1753) 

One prominent critic of the cal- 
culus was Newton's contemporary 
George Berkeley, the Bishop of 
Cloyne. Although some of his 
complaints are clearly wrong (he 
denied the possibility of the sec- 
ond derivative), there was clearly 
something to his criticism of the 
infinitesimals. He wrote sarcas- 
tically, "They are neither finite 
quantities, nor quantities infinitely 
small, nor yet nothing. May we not 
call them ghosts of departed quan- 
tities?" 

Infinitesimals seemed scary, be- 
cause if you mishandled them, you 
could prove absurd things. For 
example, let du be an infinitesi- 
mal. Then 2du is also infinites- 
imal. Therefore both l/du and 
l/(2du) equal infinity, so l/du = 
l/(2du). Multiplying by du on 
both sides, we have a proof that 
1 = 1/2. 

In the eighteenth century, the use 
of infinitesimals became like adul- 
tery: commonly practiced, but 
shameful to admit to in polite cir- 
cles. Those who used them learned 
certain rules of thumb for handling 
them correctly. For instance, they 
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would identify the flaw in my proof 
of 1 = 1/2 as my assumption that 
there was only one size of infinity, 
when actually 1/du should be in- 
terpreted as an infinity twice as big 
as l/(2du). The use of the sym- 
bol oo played into this trap, be- 
cause the use of a single symbol 
for infinity implied that infinities 
only came in one size. However, 
the practitioners of infinitesimals 
had trouble articulating a clear 
set of principles for their proper 
use, and couldn't prove that a self- 
consistent system could be built 
around them. 

By the twentieth century, when 
I learned calculus, a clear con- 
sensus had formed that infinite 
and infinitesimal numbers weren't 
numbers at all. A notation like 
dx/dt, my calculus teacher told 
me, wasn't really one number di- 
vided by another, it was merely 
a symbol for something called a 
limit, 

Ax 
lim — — , 

At-s-0 At 

where Ax and At represented fi- 
nite changes. I'll give a formal def- 
inition (actually two different for- 
mal definitions) of the term "limit" 
in section 3.2, but intuitively the 
concept is that is that we can get 
as good an approximation to the 
derivative as we like, provided that 
we make At small enough. 

That satisfied me until we got to 
a certain topic (implicit differen- 
tiation) in which we were encour- 
aged to break the da; away from 



the dt, leaving them on opposite 
sides of the equation. I button- 
holed my teacher after class and 
asked why he was now doing what 
he'd told me you couldn't really 
do, and his response was that dx 
and dt weren't really numbers, but 
most of the time you could get 
away with treating them as if they 
were, and you would get the right 
answer in the end. Most of the 
time!? That bothered me. How 
was I supposed to know when it 
wasn't "most of the time?" 




g / Abraham 
(1918-1974) 



Robinson 



But unknown to me and my 
teacher, mathematician Abraham 
Robinson had already shown in the 
1960's that it was possible to con- 
struct a self-consistent number sys- 
tem that included infinite and in- 
finitesimal numbers. He called it 
the hyperreal number system, and 
it included the real numbers as a 
subset. 3 



3 The main text of this book treats in- 
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Moreover, the rules for what you 
can and can't do with the hy- 
perreals turn out to be extremely 
simple. Take any true statement 
about the real numbers. Suppose 
it's possible to translate it into a 
statement about the hyperreals in 
the most obvious way, simply by 
replacing the word "real" with the 
word "hyperreal." Then the trans- 
lated statement is also true. This 
is known as the transfer principle. 

Let's look back at my bogus proof 
of 1 = 1/2 in light of this sim- 
ple principle. The final step of 
the proof, for example, is perfectly 
valid: multiplying both sides of the 
equation by the same thing. The 
following statement about the real 
numbers is true: 

For any real numbers a, b, and 
c, if a = b, then ac = be. 

This can be translated in an obvi- 
ous way into a statement about the 
hyperreals: 

For any hyperreal numbers a, 
b, and c, if a = 6, then ac = be. 

However, what about the state- 
ment that both 1/dw and l/(2du) 
equal infinity, so they're equal to 
each other? This isn't the trans- 
lation of a statement that's true 



fmitcsimals with the minimum fuss nec- 
essary in order to avoid the common 
goofs. More detailed discussions are of- 
ten relegated to the back of the book, as 
in example 11 on page 28. The reader 
who wants to learn even more about the 
hyperreal system should consult the list 
of further reading on page 195. 



about the reals, so there's no rea- 
son to believe it's true when ap- 
plied to the hyperreals — and in 
fact it's false. 

What the transfer principle tells us 
is that the real numbers as we nor- 
mally think of them are not unique 
in obeying the ordinary rules of al- 
gebra. There are completely dif- 
ferent systems of numbers, such 
as the hyperreals, that also obey 
them. 

How, then, are the hyperreals even 
different from the reals, if every- 
thing that's true of one is true of 
the other? But recall that the 
transfer principle doesn't guaran- 
tee that every statement about the 
reals is also true of the hyperre- 
als. It only works if the statement 
about the reals can be translated 
into a statement about the hyper- 
reals in the most simple, straight- 
forward way imaginable, simply by 
replacing the word "real" with the 
word "hyperreal." Here's an ex- 
ample of a true statement about 
the reals that can't be translated 
in this way: 

For any real number a, there 
is an integer n that is greater 
than a. 

This one can't be translated so 
simplemindedly, because it refers 
to a subset of the reals called 
the integers. It might be possi- 
ble to translate it somehow, but 
it would require some insight into 
the correct way to translate that 
word "integer." The transfer prin- 
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ciple doesn't apply to this state- 
ment, which indeed is false for the 
hyperreals, because the hyperre- 
als contain infinite numbers that 
are greater than all the integers. 
In fact, the contradiction of this 
statement can be taken as a def- 
inition of what makes the hyper- 
reals special, and different from 
the reals: we assume that there is 
at least one hyperreal number, H, 
which is greater than all the inte- 
gers. 

As an analogy from everyday life, 
consider the following statements 
about the student body of the high 
school I attended: 

1. Every student at my high 
school had two eyes and a face. 

2. Every student at my high 
school who was on the football 
team was a jerk. 

Let's try to translate these into 
statements about the population 
of California in general. The stu- 
dent body of my high school is like 
the set of real numbers, and the 
present-day population of Califor- 
nia is like the hyperreals. State- 
ment 1 can be translated mind- 
lessly into a statement that ev- 
ery Californian has two eyes and 
a face; we simply substitute "ev- 
ery Californian" for "every student 
at my high school." But state- 
ment 2 isn't so easy, because it 
refers to the subset of students 
who were on the football team, 
and it's not obvious what the cor- 
responding subset of Californians 



would be. Would it include ev- 
erybody who played high school, 
college, or pro football? Maybe 
it shouldn't include the pros, be- 
cause they belong to an organiza- 
tion covering a region bigger than 
California. Statement 2 is the kind 
of statement that the transfer prin- 
ciple doesn't apply to. 4 

Example 14 
As a nontrivial example of how to ap- 
ply the transfer principle, let's consider 
how to handle expressions like the 
one that occurred when we wanted to 
differentiate f 2 using infinitesimals: 



d^ 
df 



2t + dt 



I argued earlier than 2f+df is so close 
to 2f that for all practical purposes, the 
answer is really 2f. But is it really valid 
in general to say that 21 + df is the 
same hyperreal number as 2f? No. 
We can apply the transfer principle to 
the following statement about the re- 
als: 

For any real numbers a and b, 
with bj?0, a + b /a. 

Since df isn't zero, 2f + df 4 2t. 

More generally, example 14 leads 
us to visualize every number as be- 
ing surrounded by a "halo" of num- 
bers that don't equal it, but dif- 
fer from it by only an infinitesi- 
mal amount. Just as a magnify- 
ing glass would allow you to see 
the fleas on a dog, you would need 
an infinitely strong microscope to 



4 For a slightly more precise and for- 
mal statement of the transfer principle, 
see page 139. 
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see this halo. This is similar to 
the idea that every integer is sur- 
rounded by a bunch of fractions 
that would round off to that inte- 
ger. We can define the standard 
part of a finite hyperreal number, 
which means the unique real num- 
ber that differs from it infinitesi- 
mally. For instance, the standard 
part of 2t + dt, notated st(2t + dt), 
equals 2£. The derivative of a func- 
tion should actually be defined as 
the standard part of dx/dt, but 
we often write dx/dt to mean the 
derivative, and don't worry about 
the distinction. 

One of the things Bishop Berkeley 
disliked about infinitesimals was 
the idea that they existed in a 
kind of hierarchy, with dt 2 being 
not just infinitesimally small, but 
infinitesimally small compared to 
the infinitesimal dt. If dt is the 
flea on a dog, then dt 2 is a sub- 
microscopic flea that lives on the 
flea, as in Swift's doggerel: "Big 
fleas have little fleas/ On their 
backs to ride 'em,/ and little fleas 
have lesser fleas, /And so, ad in- 
finitum." Berkeley's criticism was 
off the mark here: there is such a 
hierarchy. Our basic assumption 
about the hyperreals was that they 
contain at least one infinite num- 
ber, H, which is bigger than all 
the integers. If this is true, then 
1/H must be less than 1/2, less 
than 1/100, less then 1/1,000,000 
— less than 1/n for any integer n. 
Therefore the hyperreals are guar- 
anteed to include infinitesimals as 



well, and so we have at least three 
levels to the hierarchy: infinities 
comparable to H, finite numbers, 
and infinitesimals comparable to 
1/H. If you can swallow that, 
then it's not too much of a leap to 
add more rungs to the ladder, like 
extra-small infinitesimals that are 
comparable to 1/H 2 . If this seems 
a little crazy, it may comfort you 
to think of statements about the 
hyperreals as descriptions of limit- 
ing processes involving real num- 
bers. For instance, in the sequence 
of numbers l.l 2 = 1.21, 1.01 2 = 
1.0201, 1.001 2 = 1.002001, . . . , it's 
clear that the number represented 
by the digit 1 in the final decimal 
place is getting smaller faster than 
the contribution due to the digit 2 
in the middle. 

One subtle issue here, which I 
avoided mentioning in the differen- 
tiation of the sine function on page 
28, is whether the transfer princi- 
ple is sufficient to let us define all 
the functions that appear as famil- 
iar keys on a calculator: x 2 , \fx. 
sin a;, cos a;, e x , and so on. After 
all, these functions were originally 
defined as rules that would take a 
real number as an input and give a 
real number as an output. It's not 
trivially obvious that their defini- 
tions can naturally be extended to 
take a hyperreal number as an in- 
put and give back a hyperreal as 
an output. Essentially the answer 
is that we can apply the transfer 
principle to them just as we would 
to statements about simple arith- 
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metic, but I've discussed this a lit- 
tle more on page 146. 

2.3 The product rule 

When I first learned calculus, it 
seemed to me that if the deriva- 
tive of 3t was 3, and the deriva- 
tive of It was 7, then the deriva- 
tive of t multiplied by t ought to 
be just plain old t, not It. The 
reason there's a factor of 2 in the 
correct answer is that t 2 has two 
reasons to grow as t gets bigger: it 
grows because the first factor of t 
is increasing, but also because the 
second one is. In general, it's pos- 
sible to find the derivative of the 
product of two functions any time 
we know the derivatives of the in- 
dividual functions. 

The product rule 

If x and y are both functions of t, 

then the derivative of their product 



whose standard part is the result 
to be proved. 



d(xy) dx 



dy 



dt 



dt V + X dt 



Example 15 
> Find the derivative of the function 
fsin t. 



d(fsinf , d(sinf) dt 
.. = t ■ — + — ■ sin f 
dt dt dt 

= tcost + s\nt 



Figure h gives the geometrical in- 
terpretation of the product rule. 
Imagine that the king, in his cas- 
tle at the southwest corner of his 
rectangular kingdom, sends out a 
line of infantry to expand his terri- 
tory to the north, and a line of cav- 
alry to take over more land to the 
east. In a time interval dt, the cav- 
alry, which moves faster, covers a 
distance da; greater than that cov- 
ered by the infantry, dy. However, 
the strip of territory conquered by 
the cavalry, ydx, isn't as great as 
it could have been, because in our 
example y isn't as big as x. 



The proof is easy. Changing t by 
an infinitesimal amount dt changes 
the product xy by an amount 

(x + dx) (y + dy) — xy 
= ydx + xdy + dxdy , 

and dividing by dt makes this into 



dx 
d7 



dy dxdy 



dy 


xdy 


dxdy 


y 


xy 


ydx 



dt 



dt 



x dx 

h / A geometrical interpretation of the 
product rule. 
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A helpful feature of the Leibniz 
notation is that one can easily 
use it to check whether the units 
of an answer make sense. If we 
measure distances in meters and 
time in seconds, then xy has units 
of square meters (area), and so 
does the change in the area, d(xy). 
Dividing by dt gives the number 
of square meters per second be- 
ing conquered. On the right-hand 
side of the product rule, dx/dt 
has units of meters per second 
(velocity), and multiplying it by 
y makes the units square meters 
per second, which is consistent 
with the left-hand side. The units 
of the second term on the right 
likewise check out. Some begin- 
ners might be tempted to guess 
that the product rule would be 
d(xy)/dt = (dx/dt)(dy/dt), but 
the Leibniz notation instantly re- 
veals that this can't be the case, 
because then the units on the left, 
m 2 /s, wouldn't match the ones on 
the right, m 2 /s 2 . 

Because this unit-checking feature 
is so helpful, there is a special way 
of writing a second derivative in 
the Leibniz notation. What New- 
ton called x, Leibniz wrote as 

d 2 x 
di 2 " 

Although the different placement 
of the 2's on top and bottom seems 
strange and inconsistent to many 
beginners, it actually works out 
nicely. If x is a distance, mea- 
sured in meters, and t is a time, 



in units of seconds, then the sec- 
ond derivative is supposed to have 
units of acceleration, in units of 
meters per second per second, also 
written (m/s)/s, or m/s 2 . (The 
acceleration of falling objects on 
Earth is 9.8 m/s 2 in these units.) 
The Leibniz notation is meant to 
suggest exactly this: the top of the 
fraction looks like it has units of 
meters, because we're not squaring 
X, while the bottom of the fraction 
looks like it has units of seconds, 
because it looks like we're squar- 
ing dt. Therefore the units come 
out right. It's important to realize, 
however, that the symbol d isn't a 
number (not a real one, and not a 
hyperreal one, either), so we can't 
really square it; the notation is not 
to be taken as a literal statement 
about infinitesimals. 



Example 16 
A tricky use of the product rule is to 
find the derivative of VI. Since V~t can 
be written as t 1/2 , we might suspect 
that the rule 6(t")/dt = kt"' 1 would 
work, giving a derivative |f~ 1/2 = 
1/(2^/1)- However, the method from 
ch. 1 used to prove that rule proved 
on p. 136 only work if k is an integer, 
so the best we could do would be to 
confirm our conjecture approximately 
by graphing or numerical estimation. 



Using the product rule, we can write 
f(t) = dVi/dt for our unknown deriva- 
tive, and back into the result using the 
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product rule: 

df 

df '' 



But df/df 
claimed. 



_ d(vVt) 

df 

= f(tW~t + Vtf(t) 

= 2f(t)V~t 
1, so f(t) = ^/(2Vt) as 



The trick used in example 16 can 
also be used to prove that the 
power rule &{x n )/&x = nx n ~ l ap- 
plies to cases where n is an integer 
less than 0, but I'll instead prove 
this on page 41 by a technique that 
doesn't depend on a trick, and also 
applies to values of n that aren't 
integers. 



2.4 The chain rule 

Figure i shows three clowns on see- 
saws. If the leftmost clown moves 
down by a distance dx, the middle 
one will come up by dy, but this 
will also cause the one on the right 
to move down by dz. If we want 
to predict how much the rightmost 
clown will move in response to a 
certain amount of motion by the 
leftmost one, we have 



dz 
dx 



dz dy 
dy dx 



This relation, called the chain rule, 
allows us to calculate a derivative 
of a function defined by one func- 
tion inside another. The proof, 
given on page 147, is essentially 
just the application of the trans- 
fer principle. (As is often the case, 



the proof using the hyperreals is 
much simpler than the one using 
real numbers and limits.) 



Example 17 
> Find the derivative of the function 
z(x) = sin(x 2 ). 



> Let y(x) 
sin(y(x)). Then 

dz 

dx 



x 2 , so that z(x) 

dz dy 
dy dx 

cos(y) • 2x 
2xcos(x 2 ) 



The way people usually say it is that 
the chain rule tells you to take the 
derivative of the outside function, the 
sine in this case, and then multiply 
by the derivative of "the inside stuff," 
which here is the square. Once you 
get used to doing it, you don't need 
to invent a third, intermediate variable, 
as we did here with y. 

Example 18 
Let's express the chain rule without 
the use of the Leibniz notation. Let the 
function f be defined by f(x) = g(h(x)). 
Then the derivative of f is given by 
f(x) = g'(h(x)) ■ h'(x). 

Example 19 
> We've already proved that the 
derivative of t k is kt k ~ : for k = -1 (ex- 
ample 1 on p. 27) and for k = 1 , 2, 3, 
... (p. 136). Use these facts to extend 
the rule to all integer values of k. 

> For k < 0, the function x = t k can 
be written as x = (f~ 1 )~\ where -k is 
positive. Applying the chain rule, we 



find dx/df 
kt"-\ 



(-k)(rT K -\-r*) 
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i Three clowns on seesaws demonstrate the chain rule. 



2.5 Exponentials and 
logarithms 

The exponential 



simply e x , multiplied by some un- 
known constant, 

de x _ x 
dx 



The exponential function e x , 
where e = 2.71828 ... is the base 
of natural logarithms, comes 
constantly up in applications as 
diverse as credit-card interest, the 
growth of animal populations, and 
electric circuits. For its derivative 
we have 



e x+dx _ gir 



de x 

dx 



dx 

The second factor, (e dx — l) /dx, 
doesn't have x in it, so it must 
just be a constant. Therefore we 
know that the derivative of e x is 



dx 




e x e dx _ 


e x 


dx 




dx _ 

e x . 


- 1 



A rough check by graphing at, say 
x = 0, shows that the slope is close 
to 1, so c is close to 1. Numer- 
ical calculation also shows that, 
for example, (e 0001 - 1)/0.001 = 
1.00050016670838 is very close to 
1. But how do we know it's exactly 
one when dx is really infinitesimal? 
We can use Inf: 

: [exp(d)-l]/d 
1+0. 5d+. . . 

(The ... indicates where I've 
snipped some higher-order terms 
out of the output.) It seems clear 
that c is equal to 1 except for neg- 
ligible terms involving higher pow- 
ers of da;. A rigorous proof is given 
on page 147. 
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Example 20 
> The concentration of a foreign sub- 
stance in the bloodstream generally 
falls off exponentially with time as c = 



c e 



t/a 



where c is the initial concen- 



tration, and a is a constant. For caf- 
feine in adults, a is typically about 7 
hours. An example is shown in figure 
j. Differentiate the concentration with 
respect to time, and interpret the re- 
sult. Check that the units of the result 
make sense. 



> Using the chain rule, 



6c 
df 



c e~ 



Co e -f/a 

a 



This can be interpreted as the rate 
at which caffeine is being removed 
from the blood and put into the per- 
son's urine. It's negative because the 
concentration is decreasing. Accord- 
ing to the original expression for x, 
a substance with a large a will take 
a long time to reduce its concentra- 
tion, since t/a won't be very big un- 
less we have large t on top to com- 
pensate for the large a on the bottom. 
In other words, larger values of a rep- 
resent substances that the body has 
a harder time getting rid of efficiently. 
The derivative has a on the bottom, 
and the interpretation of this is that for 
a drug that is hard to eliminate, the 
rate at which it is removed from the 
blood is low. 

It makes sense that a has units of 
time, because the exponential func- 
tion has to have a unitless argument, 
so the units of t/a have to cancel out. 
The units of the result come from the 
factor of c /a, and it makes sense that 



the units are concentration divided by 
time, because the result represents 
the rate at which the concentration is 
changing. 




18 24 



j / Example 20. A typ- 
ical graph of the con- 
centration of caffeine in 
the blood, in units of mil- 
ligrams per liter, as a 
function of time, in hours. 

Example 21 

> Find the derivative of the function 
y = 10*. 

> In general, one of the tricks to do- 
ing calculus is to rewrite functions in 
forms that you know how to handle. 
This one can be rewritten as a base-e 
exponent: 

y=10 x 
Iny = In (10 X ) 
Iny = xln10 

y = e xln1 ° 

Applying the chain rule, we have the 
derivative of the exponential, which is 
just the same exponential, multiplied 
by the derivative of the inside stuff: 



dy 

dx 



e* ln10 -ln10 
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In other words, the "c" referred to in 
the discussion of the derivative of e" 
becomes c = In 10 in the case of the 
base-10 exponential. 

The logarithm 

The natural logarithm is the func- 
tion that undoes the exponential. 
In a situation like this, we have 

dy _ 1 
da; dx/dy 

where on the left we're thinking of 
y as a function of x, and on the 
right we consider a; to be a function 
of y. Applying this to the natural 
logarithm, 



In x 



integration 



y = 


mx 


X = 


e y 


dx 




dy 


e v 


dy _ 


1 


dx 


e>J 




1 


= 


— 




X 


dlnx 


1 


dx 


X 



This is noteworthy because it 
shows that there must be an ex- 
ception to the rule that the deriva- 
tive of x™ is nx n ~ l , and the inte- 
gral of x" _1 is x n /n. (On page 
37 I remarked that this rule could 
be proved using the product rule 
for negative integer values of k, 
but that I would give a simpler, 




differentiation 



k / Differentiation and integration of 
functions of the form x". Constants 
out in front of the functions are not 
shown, so keep in mind that, for ex- 
ample, the derivative of x 2 isn't x, it's 
2x. 

less tricky, and more general proof 
later. The proof is example 22 be- 
low.) The integral of x _1 is not 
x°/0, which wouldn't make sense 
anyway because it involves divi- 
sion by zero. 5 Likewise the deriva- 
tive of x = 1 is 0x _1 , which is 
zero. Figure k shows the idea. The 
functions x n form a kind of ladder, 
with differentiation taking us down 
one rung, and integration taking us 
up. However, there are two special 



5 Speaking casually, one can say that 
division by zero gives infinity. This is 
often a good way to think when try- 
ing to connect mathematics to reality. 
However, it doesn't really work that way 
according to our rigorous treatment of 
the hyperreals. Consider this statement: 
"For a nonzero real number a, there is 
no real number b such that a = Ob." This 
means that we can't divide a by and get 
b. Applying the transfer principle to this 
statement, we see that the same is true 
for the hyperreals: division by zero is un- 
defined. However, we can divide a finite 
number by an infinitesimal, and get an 
infinite result, which is almost the same 
thing. 
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cases where differentiation takes us 
off the ladder entirely. 



Example 22 
> Prove d(x")/dx = nx"~ 1 for any real 
value of n, not just an integer. 



By the chain rul 


e, 






dy 
dx 


e nlr 


IX 


n 

X 


= 


x n ■ 


n 

X 




= 


nx' 


i 1 





(For n = 0, the result is zero.) 

When I started the discussion of 
the derivative of the logarithm, I 
wrote y = lnx right off the bat. 
That meant I was implicitly as- 
suming x was positive. More gen- 
erally, the derivative of In \x\ equals 
l/x, regardless of the sign (see 
problem 27 on page 49). 

2.6 Quotients 

So far we've been successful with 
a divide-and-conquer approach to 
differentiation: the product rule 
and the chain rule offer meth- 
ods of breaking a function down 
into simpler parts, and finding the 
derivative of the whole thing based 
on knowledge of the derivatives of 
the parts. We know how to find 
the derivatives of sums, differences, 
and products, so the obvious next 
step is to look for a way of handling 



division. This is straightforward, 
since we know that the derivative 
of the function 1/u = u _1 is — u~ 2 . 
Let u and v be functions of x. 
Then by the product rule, 



d(v/u) 

dx 



dv 

dx 



1 



+ v 



and by the chain rule, 



d(v/u) 

dx 



dv 
dx 



d(l/tQ 

d.r 



1 du 
u 2 dx 



This is so easy to rederive on de- 
mand that I suggest not memoriz- 
ing it. 

By the way, notice how the no- 
tation becomes a little awkward 
when we want to write a derivative 
like d(v/u)/dx. When we're differ- 
entiating a complicated function, 
it can be uncomfortable trying to 
cram the expression into the top of 
the d . . . /d . . . fraction. Therefore 
it would be more common to write 
such an expression like this: 

d /v 
dx \u 

This could be considered an abuse 
of notation, making d look like a 
number being divided by another 
number dx, when actually d is 
meaningless on its own. On the 
other hand, we can consider the 
symbol d/dx to represent the op- 
eration of differentiation with re- 
spect to x; such an interpretation 
will seem more natural to those 
who have been inculcated with the 
taboo against considering infinites- 
imals as numbers in the first place. 
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Using the new notation, the quo- 
tient rule becomes 



dx \u 



1 dv 
u dx 



du 
dx 



The interpretation of the minus 
sign is that if u increases, v/u de- 
creases. 

Example 23 

> Differentiate y = x/(1 + 3x), and 
check that the result makes sense. 

> We identify v with x and u with 1 + x. 
The result is 



dx \uJ 


1 dv 


v du 


u dx 


u 2 dx 




1 


3x 



1 + 3x (1 + 3x) 2 

One way to check that the result 
makes sense it to consider extreme 
values of x. For very large values of x, 
the 1 on the bottom of x/(1 + 3x) be- 
comes negligible compared to the 3x, 
and the function y approaches x/3x = 
1/3 as a limit. Therefore we expect 
that the derivative dy/dx should ap- 
proach zero, since the derivative of 
a constant is zero. It works: plug- 
ging in bigger and bigger numbers for 
x in the expression for the derivative 
does give smaller and smaller results. 
(In the second term, the denominator 
gets bigger faster than the numerator, 
because it has a square in it.) 

Another way to check the result is to 
verify that the units work out. Sup- 
pose arbitrarily that x has units of gal- 
lons. (If the 3 on the bottom is unitless, 
then the 1 would have to represent 1 
gallon, since you can't add things that 
have different units.) The function y is 
defined by an expression with units of 



gallons divided by gallons, so y is unit- 
less. Therefore the derivative dy/dx 
should have units of inverse gallons. 
Both terms in the expression for the 
derivative do have those units, so the 
units of the answer check out. 

2.7 Differentiation on 
a computer 

In this chapter you've learned a set 
of rules for evaluating derivatives: 
derivatives of products, quotients, 
functions inside other functions, 
etc. Because these rules exist, 
it's always possible to find a 
formula for a function's derivative, 
given the formula for the original 
function. Not only that, but there 
is no real creativity required, so a 
computer can be programmed to 
do all the drudgery. For example, 
you can download a free, open- 
source program called Yacas from 
yacas.sourceforge.net and 

install it on a Windows or Linux 
machine. There is even a version 
you can run in a web browser with- 
out installing any special software: 
http : //yacas . sourcef orge . net/ 
yacasconsole.html 
A typical session with Yacas looks 
like this: 



I 



Example 24 



D(x) x~2 

2*x 
D(x) Exp(x~2) 

2*x*Exp (x ~2) 
D(x) Sin(Cos(Sin(x))) 

-Cos (x) *Sin (Sin (x)) 
*Cos(Cos(Sin(x))) 
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Upright type represents your in- 
put, and italicized type is the pro- 
gram's output. 

First I asked it to differentiate x 
with respect to x, and it told me 
the result was 2x. Then I did 
the derivative of e x , which I also 
could have done fairly easily by 
hand. (If you're trying this out 
on a computer as you real along, 
make sure to capitalize functions 
like Exp, Sin, and Cos.) Finally 
I tried an example where I didn't 
know the answer off the top of my 
head, and that would have been a 
little tedious to calculate by hand. 

Unfortunately things are a little 
less rosy in the world of integrals. 
There are a few rules that can help 
you do integrals, e.g., that the inte- 
gral of a sum equals the sum of the 
integrals, but the rules don't cover 
all the possible cases. Using Ya- 
cas to evaluate the integrals of the 
same functions, here's what hap- 
pens. 6 

Example 25 

Integrate (x) x~2 

x ~3/3 
Integrate (x) Exp(x~2) 
Integrate (x) Exp (x~2) 
Integrate (x) 

Sin(Cos(Sin(x))) 
Integrate (x) 

Sin(Cos(Sin(x))) 



6 If you're trying these on your own 
computer, note that the long input line 
for the function sin cos sin x shouldn't be 
broken up into two lines as shown in the 
listing. 



The first one works fine, and I 
can easily verify that the answer 
is correct, by taking the derivative 
of x 3 /3, which is x 2 . (The an- 
swer could have been x 3 /3 + 7, or 
cc 3 /3+c, where c was any constant, 
but Yacas doesn't bother to tell us 
that.) The second and third ones 
don't work, however; Yacas just 
spits back the input at us without 
making any progress on it. And 
it may not be because Yacas isn't 
smart enough to figure out these 
integrals. The function e x can't 
be integrated at all in terms of a 
formula containing ordinary oper- 
ations and functions such as ad- 
dition, multiplication, exponentia- 
tion, trig functions, exponentials, 
and so on. 

That's not to say that a program 
like this is useless. For example, 
here's an integral that I wouldn't 
have known how to do, but that 
Yacas handles easily: 

Example 26 
Integrate (x) Sin(Ln(x)) 
(x*Sin(Ln(x)))/2 

-(x*Cos(Ln(x)))/2 

This one is easy to check by dif- 
ferentiating, but I could have been 
marooned on a desert island for a 
decade before I could have figured 
it out in the first place. There are 
various rules, then, for integration, 
but they don't cover all possible 
cases as the rules for differentiation 
do, and sometimes it isn't obvious 
which rule to apply. Yacas's ability 
to integrate sin In a; shows that it 
had a rule in its bag of tricks that 
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I don't know, or didn't remember, 
or didn't realize applied to this in- 
tegral. 

Back in the 17th century, when 
Newton and Leibniz invented cal- 
culus, there were no computers, so 
it was a big deal to be able to find 
a simple formula for your result. 
Nowadays, however, it may not be 
such a big deal. Suppose I want to 
find the derivative of sin cos sinx, 
evaluated at x = 1 . I can do some- 
thing like this on a calculator: 

Example 27 

sin cos sin 1 = 

0.618134.07 
sin cos sin 1.0001 = 
0.61810240 
(0.61810240-0.61813407) 
/.0001 = 
-0.3167 

I have the right answer, with 
plenty of precision for most realis- 
tic applications, although I might 
have never guessed that the myste- 
rious number —0.3167 was actually 
— (cos 1 ) (sin sin 1 ) (cos cos sin 1 ) . 
This could get a little tedious if I 
wanted to graph the function, for 
instance, but then I could just use 
a computer spreadsheet, or write 
a little computer program. In this 
chapter, I'm going to show you 
how to do derivatives and integrals 
using simple computer programs, 
using Yacas. The following little 
Yacas program does the same 
thing as the set of calculator 
operations shown above: 



r 



Example 28 



1 f (x) :=Sin(Cos(Sin(x))) 

2 x:=l 

3 dx:=.0001 

4 N( (f (x+dx)-f (x))/dx ) 

-0.31 66671628 

(I've omitted all of Yacas's output 
except for the final result.) Line 
1 defines the function we want to 
differentiate. Lines 2 and 3 give 
values to the variables x and dx. 
Line 4 computes the derivative; the 
N( ) surrounding the whole thing 
is our way of telling Yacas that we 
want an approximate numerical re- 
sult, rather than an exact symbolic 
one. 

An interesting thing to try now is 
to make dx smaller and smaller, 
and see if we get better and bet- 
ter accuracy in our approximation 
to the derivative. 

Example 29 

5 g(x,dx):= 

N( (f (x+dx)-f (x))/dx ) 

6 g(x,.l) 

-0.3022356406 

7 g(x,.0001) 

-0.3166671628 

8 g(x, .0000001) 

-0.3160458019 

9 g(x, .00000000000000001) 



Line 5 defines the derivative func- 
tion. It needs to know both x and 
dx. Line 6 computes the derivative 
using da; = 0.1, which we expect to 
be a lousy approximation, since dx 
is really supposed to be infinitesi- 
mal, and 0.1 isn't even that small. 
Line 7 does it with the same value 
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of da; we used earlier. The two re- 
sults agree exactly in the first dec- 
imal place, and approximately in 
the second, so we can be pretty 
sure that the derivative is —0.32 
to two figures of precision. Line 
8 ups the ante, and produces a re- 
sult that looks accurate to at least 
3 decimal places. Line 9 attempts 
to produce fantastic precision by 
using an extremely small value of 
dx. Oops — the result isn't bet- 
ter, it's worse! What's happened 
here is that Yacas computed f(x) 
and f(x + dx), but they were the 
same to within the precision it was 
using, so f(x + dx) — f(x) rounded 
off to zero. 7 

Example 29 demonstrates the con- 
cept of how a derivative can be de- 
fined in terms of a limit: 

dy .. Ay 
— = lim 

da; Ai-*o Aa; 

The idea of the limit is that we 
can theoretically make Ay/ Ax ap- 
proach as close as we like to dy/dx, 
provided we make Aa; sufficiently 
small. In reality, of course, we 
eventually run into the limits of 
our ability to do the computation, 
as in the bogus result generated on 
line 9 of the example. 



7 Yacas can do arithmetic to any 
precision you like, although you may 
run into practical limits due to the 
amount of memory your computer has 
and the speed of its CPU. For fun, 
try N(Pi,1000), which tells Yacas to 
compute 7r numerically to 1000 decimal 
places. 



40 



CHAPTER 2. TO INFINITY — AND BEYOND! 



Problems 

1 Carry out a calculation like 
the one in example 9 on page 26 
to show that the derivative of t 4 
equals 4t 3 . > Solution, p. 169 

2 Example 12 on page 29 gave 
a tricky argument to show that the 
derivative of cost is — sint. Prove 
the same result using the method 
of example 11 instead. 

> Solution, p. 169 

3 Suppose H is a big number. 
Experiment on a calculator to fig- 
ure out whether \/H + 1 — \/H — 1 
comes out big, normal, or tiny. Try 
making H bigger and bigger, and 
see if you observe a trend. Based 
on these numerical examples, form 
a conjecture about what happens 
to this expression when H is infi- 
nite. > Solution, p. 170 

4 Suppose da; is a small but 
finite number. Experiment on a 
calculator to figure out how \/dx 
compares in size to da;. Try mak- 
ing da; smaller and smaller, and 
see if you observe a trend. Based 
on these numerical examples, form 
a conjecture about what happens 
to this expression when da; is in- 
finitesimal. > Solution, p. 170 

5 To which of the following 
statements can the transfer prin- 
ciple be applied? If you think it 
can't be applied to a certain state- 
ment, try to prove that the state- 
ment is false for the hyperreals, 
e.g., by giving a counterexample. 



(a) For any real numbers x and y, 
x + y = y + x. 

(b) The sine of any real number is 
between —1 and 1. 

(c) For any real number x, there 
exists another real number y that 
is greater than x. 

(d) For any real numbers x ^ y, 
there exists another real number z 
such that x < z < y. 

(e) For any real numbers x ^ y, 
there exists a rational number z 
such that x < z < y. (A ratio- 
nal number is one that can be ex- 
pressed as an integer divided by 
another integer.) 

(f) For any real numbers x, y, and 
Z, {x + y) + z = x+ (y + z). 

(g) For any real numbers x and y, 
either x < y or x = y or x > y. 
(h) For any real number x, x + 1 7^ 
X. > Solution, p. 170 

6 If we want to pump air 

or water through a pipe, com- 
mon sense tells us that it will be 
easier to move a larger quantity 
more quickly through a fatter pipe. 
Quantitatively, we can define the 
resistance, i?, which is the ratio 
of the pressure difference produced 
by the pump to the rate of flow. 
A fatter pipe will have a lower re- 
sistance. Two pipes can be used 
in parallel, for instance when you 
turn on the water both in the 
kitchen and in the bathroom, and 
in this situation, the two pipes let 
more water flow than either would 
have let flow by itself, which tells 
us that they act like a single pipe 
with some lower resistance. The 
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equation for their combined resis- 
tance is R = l/(l/Ri + 1/-R 2 )- 
Analyze the case where one resis- 
tance is finite, and the other infi- 
nite, and give a physical interpre- 
tation. Likewise, discuss the case 
where one is finite, but the other is 
infinitesimal. 

t> Solution, p. 170 

7 Naively, we would imagine 
that if a spaceship traveling at u = 
3/4 of the speed of light was to 
shoot a missile in the forward di- 
rection at v = 3/4 of the speed 
of light (relative to the ship) , then 
the missile would be traveling at 
u + v = 3/2 of the speed of light. 
However, Einstein's theory of rela- 
tivity tells us that this is too good 
to be true, because nothing can go 
faster than light. In fact, the rela- 
tivistic equation for combining ve- 
locities in this way is not u + v, but 
rather (u + v)/(l + uv). In ordi- 
nary, everyday life, we never travel 
at speeds anywhere near the speed 
of light. Show that the nonrela- 
tivistic result is recovered in the 
case where both u and v are in- 
finitesimal. > Solution, p. 171 

8 Differentiate (2x + 3) 100 with 
respect to X. > Solution, p. 171 

9 Differentiate (x + l) 100 (x + 
2) 200 with respect to x. 

> Solution, p. 171 

10 Differentiate the following 
with respect to x: e , e e . (In 
the latter expression, as in all ex- 
ponentials nested inside exponen- 
tials, the evaluation proceeds from 



the top down, i.e., e*- e ', not (e e ) x .) 

> Solution, p. 171 

11 Differentiate asin(6a; + c) 
with respect to x. 

> Solution, p. 171 

12 Let x = t p ' q , where p and 
q are positive integers. By a tech- 
nique similar to the one in exam- 
ple 19 on p. 37, prove that the dif- 
ferentiation rule for t holds when 
k = p/q.qwe > Solution, p. ?? 

13 Find a function whose 
derivative with respect to x equals 
asm(bx + c). That is, find an inte- 
gral of the given function. 

> Solution, p. 172 

14 Use the chain rule to differ- 
entiate ((a; 2 ) 2 ) 2 , and show that you 
get the same result you would have 
obtained by differentiating a: 8 . 

> Solution, p. 172 [M. Livshits] 

15 The range of a gun, when 
elevated to an angle 9, is given by 

2v 2 

R = sin 9 cos 9 

9 

Find the angle that will produce 
the maximum range. 

> Solution, p. 172 

16 The hyperbolic cosine func- 
tion is defined by 

, e x + e~ x 
cosh a; = 



Find any minima and maxima of 
this function. 

> Solution, p. 173 

17 Show that the function 

sin(sin(sina;)) has maxima and 
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minima at all the same places 
where sin a; does, and at no other 

places. > Solution, p. 173 

18 In free fall, the acceleration 
will not be exactly constant, due 
to air resistance. For example, a 
skydiver does not speed up indefi- 
nitely until opening her chute, but 
rather approaches a certain maxi- 
mum velocity at which the upward 
force of air resistance cancels out 
the force of gravity. The expres- 
sion for the distance dropped by of 
a free- falling object, with air resis- 
tance, is 8 



cosh t 




d = Aln 



where g is the acceleration the ob- 
ject would have without air resis- 
tance, the function cosh has been 
defined in problem 16, and A is a 
constant that depends on the size, 
shape, and mass of the object, and 
the density of the air. (For a sphere 
of mass m and diameter d dropping 
in air, A = Allm/d 2 . Cf. problem 
10, p. 113.) 

(a) Differentiate this expression to 
find the velocity. Hint: In order to 
simplify the writing, start by defin- 
ing some other symbol to stand for 
the constant -J g/A. 

(b) Show that your answer can be 
reexpressed in terms of the func- 
tion tanh defined by tanha; = (e x — 
e- x )/{e x + e- x ). 

(c) Show that your result for the 
velocity approaches a constant for 



8 Jan Benacka and Igor Stubna, The 
Physics Teacher, 43 (2005) 432. 



large values of t. 

(d) Check that your answers to 

parts b and c have units of velocity. 

> Solution, p. 174 

19 Differentiate tan 9 with re- 
spect to 9. > Solution, p. 174 

20 Differentiate yfx with re- 
spect to X. > Solution, p. 174 

21 Differentiate the following 
with respect to x: 

(a) y = \/ x 2 + 1 

(b) y = \Jx 2 + a 2 

(c) y = 1/y/ a + x 

(d) y = a/Va — x 2 

> Solution, p. 174 [Thompson, 1919] 

22 Differentiate ln(2i + 1) with 
respect to t. > Solution, p. 175 

23 If you know the derivative of 
sin a;, it's not necessary to use the 
product rule in order to differenti- 
ate 3 since, but show that using the 
product rule gives the right result 
anyway. > Solution, p. 175 

24 The r function (capital 
Greek letter gamma) is a contin- 
uous mathematical function that 
has the property T(n) = 1 ■ 2 ■ 
. . . • (n — 1) for n an integer. T(x) 
is also well defined for values of x 
that are not integers, e.g., T(l/2) 
happens to be y/n. Use computer 
software that is capable of evalu- 
ating the r function to determine 
numerically the derivative of r(x) 
with respect to x, at x = 2. (In Ya- 
cas, the function is called Gamma.) 

> Solution, p. 175 
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25 For a cylinder of fixed 
surface area, what proportion of 
length to radius will give the max- 
imum volume? 

> Solution, p. 175 

26 This problem is a varia- 
tion on problem 11 on page 21. 
Einstein found that the equation 
K = (l/2)mv 2 for kinetic energy 
was only a good approximation for 
speeds much less than the speed of 
light, c. At speeds comparable to 
the speed of light, the correct equa- 
tion is 



29 Suppose we have a list of 



numbers X\, . 



and we wish to 



K 



1 2 

ifflt) 



v/l - v 2 /c 2 

(a) As in the earlier, simpler prob- 
lem, find the power dK/dt for 
an object accelerating at a steady 
rate, with v = at. 

(b) Check that your answer has the 
right units. 

(c) Verify that the power required 
becomes infinite in the limit as v 
approaches c, the speed of light. 
This means that no material ob- 
ject can go as fast as the speed of 

light. > Solution, p. 176 

27 Prove, as claimed on page 
41, that the derivative of ln|a;| 
equals l/x, for both positive and 
negative x. > Solution, p. 176 

28 On even function is one with 
the property f(—x) = f{x). For 
example, cosx is an even func- 
tion, and x n is an even function 
if n is even. An odd function has 
f(—x) = —f(x). Prove that the 
derivative of an even function is 
odd. > Solution, p. 177 



find some number q that is as close 
as possible to as many of the Xi as 
possible. To make this a mathe- 
matically precise goal, we need to 
define some numerical measure of 
this closeness. Suppose we let h = 
(xi — q) 2 + . . . + (x n — q) 2 , which can 
also be notated using E, uppercase 
Greek sigma, as h = 5^™=i( a '« — l) 2 ■ 
Then minimizing h can be used as 
a definition of optimal closeness. 
(Why would we not want to use 
h = J2" =1 {xi — q)?) Prove that 
the value of q that minimizes h is 
the average of the Xi. 

30 Use a trick similar to the one 
used in example 16 to prove that 
the power rule d{x )/dx = kx k ~ 1 
applies to cases where k is an inte- 
ger less than 0. 

> Solution, p. 177 * 

31 The plane of Euclidean ge- 
ometry is today often described 
as the set of all coordinate pairs 
(x,y), where x and y are real. We 
could instead imagine the plane F 
that is defined in the same way, but 
with x and y taken from the set 
of hyperreal numbers. As a third 
alternative, there is the plane G 
in which the finite hyperreals are 
used. In E, Euclid's parallel postu- 
late holds: given a line and a point 
not on the line, there exists ex- 
actly one line passing through the 
point that does not intersect the 
line. Does the parallel postulate 
hold in F? In G? Is it valid to as- 
sociate only E with the plane de- 
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scribed by Euclid's axioms? 

> Solution, p. 177 * 

32 Discuss the following state- 
ment: The repeating decimal 
0.999 . . . is infinitesimally less than 
one. > Solution, p. 178 

33 Example 18 on page 37 ex- 
pressed the chain rule without the 
Leibniz notation, writing a func- 
tion / defined by f(x) = g{h(x)). 
Suppose that you're trying to re- 
member the rule, and two of the 
possibilities that come to mind are 
f(x) = g'{h{x)) and f(x) = 
g' (h(x))h(x) . Show that neither 
of these can possibly be right, by 
considering the case where x has 
units. You may find it helpful to 
convert both expressions back into 
the Leibniz notation. 

> Solution, p. 178 

34 When you tune in a radio 
station using an old-fashioned ro- 
tating dial you don't have to be 
exactly tuned in to the right fre- 
quency in order to get the station. 
If you did, the tuning would be in- 
finitely sensitive, and you'd never 
be able to receive any signal at all! 
Instead, the tuning has a certain 
amount of "slop" intentionally de- 
signed into it. The strength of the 
received signal s can be expressed 
in terms of the dial's setting / by 
a function of the form 

1 



very general, and is encountered in 
many other physical contexts. The 
graph below shows the resulting 
bell-shaped curve. Find the fre- 
quency / at which the maximum 
response occurs, and show that if b 
is small, the maximum occurs close 
to, but not exactly at, f Q . 

> Solution, p. 178 




Mf 2 " /o 2 ) 2 + bf 2 

where a, b, and f are constants. 
This functional form is in fact 



The function of problem 
34, with a = 3, b = 1 , and 



35 In a movie theater, the 

image on the screen is formed by 
a lens in the projector, and orig- 
inates from one of the frames on 
the strip of celluloid film (or, in the 
newer digital projection systems, 
from a liquid crystal chip). Let the 
distance from the film to the lens 
be x, and let the distance from the 
lens to the screen be y. The pro- 
jectionist needs to adjust x so that 
it is properly matched with y, or 
else the image will be out of focus. 
There is therefore a fixed relation- 
ship between x and y, and this re- 
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Problem 35. A set of light rays is emitted from the tip of the glamorous movie 
star's nose on the film, and reunited to form a spot on the screen which is the 
image of the same point on his nose. The distances have been distorted for 
clarity. The distance y represents the entire length of the theater from front to 
back. 



lationship is of the form 



1 

x 



1 

7 



where / is a property of the lens, 
called its focal length. A stronger 
lens has a shorter focal length. 
Since the theater is large, and the 
projector is relatively small, x is 
much less than y. We can see 
from the equation that if y is suffi- 
ciently large, the left-hand side of 
the equation is dominated by the 
1/x term, and we have x f» /. 
Since the 1/y term doesn't com- 
pletely vanish, we must have x 
slightly greater than /, so that the 
1/x term is slightly less than 1//. 
Let x = f + dx, and approximate 
dx as being infinitesimally small. 
Find a simple expression for y in 
terms of / and dx. 

t> Solution, p. 179 

36 Why might the expression 
1°° be considered an indeterminate 
form? [> Solution, p. 180 
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3 Limits and continuity 



3.1 Continuity 

Intuitively, a continuous function 
is one whose graph has no sudden 
jumps in it; the graph is all a sin- 
gle connected piece. Formally, a 
function f(x) is defined to be con- 
tinuous if for any real x and any 
infinitesimal dx, f(x + dx) — f(x) 
is infinitesimal. 

Example 30 
Let the function f be defined by f(x) = 
for x < 0, and f(x) = 1 for x > 0. 
Then f(x) is discontinuous, since for 
dx > 0, f(0 + 6x)-f(0) = 1, which isn't 
infinitesimal. 



a / Example 30. The 
black dot indicates that 
the endpoint of the lower 
ray is part of the ray, 
while the white one 
shows the contrary for 
the ray on the top. 



that a function can be continuous 
without being differentiable. 

In most cases, there is no need 
to invoke the definition explicitly 
in order to check whether a func- 
tion is continuous. Most of the 
functions we work with are de- 
fined by putting together simpler 
functions as building blocks. For 
example, let's say we're already 
convinced that the functions de- 
fined by g(x) = 3x and h(x) = 
sins are both continuous. Then if 
we encounter the function f(x) = 
sin(3:r), we can tell that it's con- 
tinuous because its definition cor- 
responds to f(x) = h(g(x)). The 
functions g and h have been set 
up like a bucket brigade, so that 
g takes the input, calculates the 
output, and then hands it off to 
h for the final step of the calcu- 
lation. This method of combin- 
ing functions is called composition. 
The composition of two continuous 
functions is also continuous. Just 
watch out for division. The func- 
tion f(x) = l/x is continuous ev- 
erywhere except at x = 0, so for 
example 1/ sin(a;) is continuous ev- 
erywhere except at multiples of it, 
where the sine has zeroes. 



If a function is discontinuous at a 
given point, then it is not differen- 
tiable at that point. On the other 
hand, the example y = \x\ shows 



The intermediate value theorem 

Another way of thinking about 
continuous functions is given by 



■ >:] 
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the intermediate value theorem. 
Intuitively, it says that if you are 
moving continuously along a road, 
and you get from point A to point 
B, then you must also visit every 
other point along the road; only by 
teleporting (by moving discontin- 
uously) could you avoid doing so. 
More formally, the theorem states 
that if y is a continuous real- valued 
function on the real interval from a 
to 6, and if y takes on values y\ and 
2/2 at certain points within this in- 
terval, then for any y 3 between y\ 
and 2/2, there is some real x in the 
interval for which y(x) = j/3. 




b / The intermediate value theorem 
states that if the function is continu- 
ous, it must pass through y 3 . 



The intermediate value theorem 
seems so intuitively appealing that 
if we want to set out to prove it, 
we may feel as though we're being 
asked to prove a proposition such 
as, "a number greater than 10 ex- 
ists." If a friend wanted to bet 
you a six-pack that you couldn't 



prove this with complete mathe- 
matical rigor, you would have to 
get your friend to spell out very 
explicitly what she thought were 
the facts about integers that you 
were allowed to start with as ini- 
tial assumptions. Are you allowed 
to assume that 1 exists? Will she 
grant you that if a number n ex- 
ists, so does n + 1? The interme- 
diate value theorem is similar. It's 
stated as a theorem about certain 
types of functions, but its truth 
isn't so much a matter of the prop- 
erties of functions as the properties 
of the underlying number system. 
For the reader with a interest in 
pure mathematics, I've discussed 
this in more detail on page 152 and 
given an abbreviated proof. (Most 
introductory calculus texts do not 
prove it at all.) 

Example 31 

> Show that there is a solution to the 
equation 10* + x= 1000. 

> We expect there to be a solution 
near x = 3, where the function f(x) = 
10 x + x = 1003 is just a little too big. 
On the other hand, f(2) = 102 is much 
too small. Since f has values above 
and below 1000 on the interval from 
2 to 3, and f is continuous, the inter- 
mediate value theorem proves that a 
solution exists between 2 and 3. If we 
wanted to find a better numerical ap- 
proximation to the solution, we could 
do it using Newton's method, which is 
introduced in section 5.1. 

Example 32 
> Show that there is at least one so- 
lution to the equation cosx = x, and 
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give bounds on its location. 

> This is a transcendental equation, 
and no amount of fiddling with alge- 
bra and trig identities will ever give a 
closed-form solution, i.e., one that can 
be written down with a finite number of 
arithmetic operations to give an exact 
result. However, we can easily prove 
that at least one solution exists, by 
applying the intermediate value theo- 
rem to the function x - cosx. The 
cosine function is bounded between 
-1 and 1, so this function must be 
negative for x < -1 and positive for 
x > 1 . By the intermediate value the- 
orem, there must be a solution in the 
interval -1 < x < 1. The graph, c, 
verifies this, and shows that there is 
only one solution. 




c / The function x - cos x 
constructed in example 
32. 



Example 33 

> Prove that every odd-order polyno- 
mial P with real coefficients has at 
least one real root x, i.e., a point at 
which P(x) = 0. 

> Example 32 might have given the 
impression that there was nothing 



to be learned from the intermediate 
value theorem that couldn't be deter- 
mined by graphing, but this example 
clearly can't be solved by graphing, 
because we're trying to prove a gen- 
eral result for all polynomials. 

To see that the restriction to odd or- 
ders is necessary, consider the poly- 
nomial x 2 + 1 , which has no real roots 
because x 2 > for any real number 
x. 

To fix our minds on a concrete ex- 
ample for the odd case, consider the 



polynomial P(x) = x 3 



x + 17. For 



large values of x, the linear and con- 
stant terms will be negligible com- 
pared to the x 3 term, and since x 3 
is positive for large values of x and 
negative for large negative ones, it fol- 
lows that P is sometimes positive and 
sometimes negative. 

Making this argument more general 
and rigorous, suppose we had a poly- 
nomial of odd order n that always had 
the same sign for real x. Then by the 
transfer principle the same would hold 
for any hyperreal value of x. Now if x 
is infinite then the lower-order terms 
are infinitesimal compared to the x" 
term, and the sign of the result is de- 
termined entirely by the x" term, but 
x" and (-x) n have opposite signs, and 
therefore P(x) and P(-x) have op- 
posite signs. This is a contradiction, 
so we have disproved the assumption 
that P always had the same sign for 
real x. Since P is sometimes nega- 
tive and sometimes positive, we con- 
clude by the intermediate value theo- 
rem that it is zero somewhere. 



Example 34 
> Show that the equation x = sin 1 /x 



->(■> 
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has infinitely many solutions. 

> This is another example that can't 
be solved by graphing; there is clearly 
no way to prove, just by looking at 
a graph like d, that it crosses the x 
axis infinitely many times. The graph 
does, however, help us to gain intu- 
ition for what's going on. As x gets 
smaller and smaller, 1/x blows up, 
and sin 1/x oscillates more and more 
rapidly. The function f is undefined 
at 0, but it's continuous everywhere 
else, so we can apply the intermedi- 
ate value theorem to any interval that 
doesn't include 0. 

We want to prove that for any positive 
u, there exists an x with < x < u 
for which f(x) has either desired sign. 
Suppose that this fails for some real 
u. Then by the transfer principle the 
nonexistence of any real x with the de- 
sired property also implies the nonex- 
istence of any such hyperreal x. But 
for an infinitesimal x the sign of f is 
determined entirely by the sine term, 
since the sine term is finite and the lin- 
ear term infinitesimal. Clearly sin 1/x 
can't have a single sign for all values 
of x less than u, so this is a contradic- 
tion, and the proposition succeeds for 
any u. It follows from the intermediate 
value theorem that there are infinitely 
many solutions to the equation. 



The extreme value theorem 

In chapter 1, we saw that locat- 
ing maxima and minima of func- 
tions may in general be fairly dif- 
ficult, because there are so many 




d/The 
x — sin 1/x. 



function 



different ways in which a function 
can attain an extremum: e.g., at 
an endpoint, at a place where its 
derivative is zero, or at a nondiffer- 
entiable kink. The following theo- 
rem allows us to make a very gen- 
eral statement about all these pos- 
sible cases, assuming only continu- 
ity. 

The extreme value theorem states 
that if / is a continuous real- valued 
function on the real-number inter- 
val defined by a < x < 6, then / 
has maximum and minimum val- 
ues on that interval, which are at- 
tained at specific points in the in- 
terval. 

Let's first see why the assumptions 
are necessary. If we weren't com- 
bined to a finite interval, then y = 
x would be a counterexample, be- 
cause it's continuous and doesn't 
have any maximum or minimum 
value. If we didn't assume conti- 
nuity, then we could have a func- 
tion defined as y = x for x < 1, 
and y = for x > 1; this func- 
tion never gets bigger than 1, but 
it never attains a value of 1 for any 
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specific value of x. 

The extreme value theorem is 
proved, in a somewhat more gen- 
eral form, on page 155. 



Example 35 

> Find the maximum value of the poly- 
nomial P(x) = x 3 + x 2 + x + 1 for 
-5 < x < 5. 

> Polynomials are continuous, so the 
extreme value theorem guarantees 
that such a maximum exists. Suppose 
we try to find it by looking for a place 
where the derivative is zero. The 
derivative is 3x 2 + 2x + 1 , and setting it 
equal to zero gives a quadratic equa- 
tion, but application of the quadratic 
formula shows that it has no real so- 
lutions. It appears that the function 
doesn't have a maximum anywhere 
(even outside the interval of interest) 
that looks like a smooth peak. Since it 
doesn't have kinks or discontinuities, 
there is only one other type of maxi- 
mum it could have, which is a maxi- 
mum at one of its endpoints. Plugging 
in the limits, we find P(— 5) = -104 
and P(5) = 156, so we conclude that 
the maximum value on this interval is 
156. 
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3.2 Limits 

Historically, the calculus of in- 
finitesimals as created by New- 
ton and Leibniz was reinterpreted 
in the nineteenth century by 
Cauchy, Bolzano, and Weierstrass 
in terms of limits. All mathemati- 
cians learned both languages, and 
switched back and forth between 
them effortlessly, like the lady I 
overheard in a Southern California 
supermarket telling her mother, 
"Let's get that one, con los nuts." 
Those who had been trained in in- 
finitesimals might hear a statement 
using the language of limits, but 
translate it mentally into infinites- 
imals; to them, every statement 
about limits was really a state- 
ment about infinitesimals. To their 
younger colleagues, trained using 
limits, every statement about in- 
finitesimals was really to be under- 
stood as shorthand for a limiting 
process. When Robinson laid the 
rigorous foundations for the hyper- 
real number system in the 1960's, a 
common objection was that it was 
really nothing new, because ev- 
ery statement about infinitesimals 
was really just a different way of 
expressing a corresponding state- 
ment about limits; of course the 
same could have been said about 
Weierstrass's work of the preced- 
ing century! In reality, all prac- 
titioners of calculus had realized 
all along that different approaches 
worked better for different prob- 
lems; problem 12 on page 82 is an 
example of a result that is much 



easier to prove with infinitesimals 
than with limits. 

The Weierstrass definition of a 
limit is this: 



Definition of the limit 

We say that £ is the limit of the 

function f(x) as x approaches a, 

written 

lim f(x) = £ , 

x— >a 

if the following is true: for any real 
number e, there exists another real 
number 5 such that for all x in the 
interval a— 6 < x < a+S, the value 
of / lies within the range from £ — e 
to £ + e. 



Intuitively, the idea is that if I want 
you to make f(x) close to £, I just 
have to tell you how close, and you 
can tell me that it will be that close 
as long as x is within a certain dis- 
tance of a. 

In terms of infinitesimals, we have: 



Definition of the limit 

We say that £ is the limit of the 

function f(x) as x approaches a, 

written 

lim f(x) = £ , 

x— >a 

if the following is true: for any in- 
finitesimal number da;, the value of 
f(a+dx) is finite, and the standard 
part of f(a + dx) equals £. 
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The two definitions are equivalent. 

Sometimes a limit can be evaluated 
simply by plugging in numbers: 



I 

> Evaluate 



Example 36 



lim 



1 



>o 1 + x 



> Plugging in x 
limit is 1. 



0, we find that the 



In some examples, plugging in fails 
if we try to do it directly, but can 
be made to work if we massage the 
expression into a different form: 



r 



Example 37 



> Evaluate 



lim 



+ 7 



o 1 + 8686 

> Plugging in x = fails because divi- 
sion by zero is undefined. 

Intuitively, however, we expect that the 
limit will be well defined, and will equal 
2, because for very small values of 
x, the numerator is dominated by the 
2/x term, and the denominator by the 
1 /x term, so the 7 and 8686 terms will 
matter less and less as x gets smaller 
and smaller. 

To demonstrate this more rigorously, a 
trick that works is to multiply both the 
top and the bottom by x, giving 

2 + 7x 



1 + 8686x 



which equals 2 when we plug in x ■ 
so we find that the limit is zero. 



0, 



This example is a little subtle, because 
when x equals zero, the function is not 



defined, and moreover it would not be 
valid to multiply both the top and the 
bottom by x. In general, it's not valid 
algebra to multiply both the top and 
the bottom of a fraction by 0, because 
the result is 0/0, which is undefined. 
But we didn't actually multiply both the 
top and the bottom by zero, because 
we never let x equal zero. Both the 
Weierstrass definition and the defini- 
tion in terms of infinitesimals only re- 
fer to the properties of the function in a 
region very close to the limiting point, 
not at the limiting point itself. 

This is an example in which the func- 
tion was not well defined at a certain 
point, and yet the limit of the function 
was well defined as we approached 
that point. In a case like this, where 
there is only one point missing from 
the domain of the function, it is natural 
to extend the definition of the function 
by filling in the "gap tooth." Example 
39 below shows that this kind of filling- 
in procedure is not always possible. 




e / Example 38, the func- 
tion 1/x 2 . 



r 



> Investigate the 



Example 38 
limiting behavior of 
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1 /x 2 as x approaches 0, and 1 . 

> At x = 1 , plugging in works, and we 
find that the limit is 1. 

At x = 0, plugging in doesn't work, 
because division by zero is unde- 
fined. Applying the definition in terms 
of infinitesimals to the limit as x ap- 
proaches 0, we need to find out 
whether 1/(0 + dx) 2 is finite for in- 
finitesimal dx, and if so, whether it al- 
ways has the same standard part. But 
clearly 1/(0 + dx) 2 = dx~ 2 is always 
infinite, and we conclude that this limit 
is undefined. 




f / Example 39, the func- 
tion tan" 1 (1/x). 



Example 39 
> Investigate the limiting behavior of 
f(x) = tan -1 (1 /x) as x approaches 0. 

> Plugging in doesn't work, because 
division by zero is undefined. 

In the definition of the limit in terms 
of infinitesimals, the first requirement 
is that f{0 + dx) be finite for infinites- 
imal values of dx. The graph makes 
this look plausible, and indeed we can 
prove that it is true by the transfer prin- 
ciple. For any real x we have -n/2 < 



f(x) < n/2, and by the transfer prin- 
ciple this holds for the hyperreals as 
well, and therefore f(0 + dx) is finite. 

The second requirement is that the 
standard part of f(0 + dx) have a 
uniquely defined value. The graph 
shows that we really have two cases 
to consider, one on the right side of 
the graph, and one on the left. In- 
tuitively, we expect that the standard 
part of f(0 + dx) will equal n/2 for pos- 
itive dx, and -n/2 for negative, and 
thus the second part of the definition 
will not be satisfied. For a more formal 
proof, we can use the transfer princi- 
ple. For real x with < x < 1 , for ex- 
ample, f is always positive and greater 
than 1 , so we conclude based on the 
transfer principle that f(0 + dx) > 1 
for positive infinitesimal dx. But on 
similar grounds we can be sure that 
f(0 + dx) < -1 when dx is negative 
and infinitesimal. Thus the standard 
part of f(0 + dx) can have different val- 
ues for different infinitesimal values of 
dx, and we conclude that the limit is 
undefined. 

In examples like this, we can define 
a kind of one-sided limit, notated like 
this: 

hm tan - = -- 

x^o- x 2 

,. -1 1 n 

hm tan - = - 

x^o+ x 2 

where the notations x — ► 0~ and 
x ->■ + are to be read "as x ap- 
proaches zero from below," and "as x 
approaches zero from above." 

3.3 L'Hopital's rule 

Consider the limit 

sin t 

lim 



x^0 X 
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Plugging in doesn't work, because 
we get 0/0. Division by zero is 
undefined, both in the real num- 
ber system and in the hyperreals. 
A nonzero number divided by a 
small number gives a big number; a 
nonzero number divided by a very 
small number gives a very big num- 
ber; and a nonzero number divided 
by an infinitesimal number gives 
an infinite number. On the other 
hand, dividing zero by zero means 
looking for a solution to the equa- 
tion = Oq, where q is the result 
of the division. But any q is a 
solution of this equation, so even 
speaking casually, it's not correct 
to say that 0/0 is infinite; it's not 
infinite, it's anything we like. 

Since plugging in zero didn't work, 
let's try estimating the limit by 
plugging in a number for x that's 
small, but not zero. On a calcula- 
tor, 



sin 0.00001 
0.00001 



0.999999999983333 



It looks like the limit is 1. We can 
confirm our conjecture to higher 
precision using Yacas's ability to 
do high-precision arithmetic: 

N(Sin(10~-20)/10~-20,50) 
0. 99999999999999999 
9999999999999999999 
99998333333333 

It's looking pretty one-ish. This is 
the idea of the Weierstrass defini- 
tion of a limit: it seems like we can 
get an answer as close to 1 as we 



like, if we're willing to make x as 
close to as necessary. The graph 
helps to make this plausible. 




-20 -10 



10 20 



g / The graph of sin x/x. 

The general idea here is that for 
small values of x, the small-angle 
approximation sinx w x obtains, 
and as x gets smaller and smaller, 
the approximation gets better and 
better, so sin x/x gets closer and 
closer to 1. 

But we still haven't proved rigor- 
ously that the limit is exactly 1. 
Let's try using the definition of the 
limit in terms of infinitesimals. 



sin a; 
lim 

x^a x 


= st 


sin(0 + dx) 
+ dx 




= st 


dx + . . . 
dx 





where . . 


stands for terms of order 


dx 2 . So 






sinx r 

hm = st 1 + 

x^a x L 


dxJ 




= 1 





We can check our work using Inf: 
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: (sin d)/d 

l+(-0. 16667) d~2+. . . 

(The ... is where I've snipped 
trailing terms from the output.) 

This is a special case of a the fol- 
lowing rule for calculating limits 
involving 0/0: 

L'Hopital's rule (simplest form) 
If u and v are functions with 
u(a) = and v(a) = 0, the deriva- 
tives v(a) and v(a) are defined, and 
the derivative v(a) ^ 0, then 



lim — 

x— >a V 



u(a) 
v(a) 



the derivative u, and likewise for 
du/dx, so this establishes the re- 
sult. 

We will generalize L'Hopital's rule 
on p. 65. 

By the way, the housetop accent 
on the "6" in l'Hopital means that 
in Old French it used to be spelled 
and pronounced "l'Hospital," but 
the "s" later became silent, so they 
stopped writing it. So yes, it is the 
same word as "hospital." 

I 



Example 40 



> Evaluate 



lim 



Proof: Since u(a) = 0, and the 
derivative du/dx is defined at a, 
u(a+dx) = du is infinitesimal, and 
likewise for v. By the definition of 
the limit, the limit is the standard 
part of 



du 

dv 



du/dx 
du/dx 



where by assumption the numer- 
ator and denominator are both 
defined (and finite, because the 
derivative is defined in terms of 
the standard part). The stan- 
dard part of a quotient like p/q 
equals the quotient of the stan- 
dard parts, provided that both p 
and q are finite (which we've estab- 
lished), and 5/O (which is true 
by assumption). But the standard 
part of du/dx is the definition of 



> Taking the derivatives of the top and 
bottom, we find e x /1, which equals 1 
when evaluated at x = 0. 



Example 4 1 



> Evaluate 



lim 



1 



1 x 2 - 2x + 1 



> Plugging in x = 1 fails, because both 
the top and the bottom are zero. Tak- 
ing the derivatives of the top and bot- 
tom, we find 1/(2x - 2), which blows 
up to infinity when x = 1 . To symbol- 
ize the fact that the limit is undefined, 
and undefine because it blows up to 
infinity, we write 



lim 



1 



ix 2 -2x+1 



3.4. ANOTHER PERSPECTIVE ON INDETERMINATE FORMS63 



3.4 Another 

perspective on 

indeterminate 

forms 

An expression like 0/0, called 
an indeterminate form, can be 
thought of in a different way in 
terms of infinitesimals. Suppose 
I tell you I have two infinitesimal 
numbers d and e in my pocket, 
and I ask you whether d/e is fi- 
nite, infinite, or infinitesimal. You 
can't tell, because d and e might 
not be infinitesimals of the same 
order of magnitude. For instance, 
if e = 37<i, then d/e = 1/37 is fi- 
nite; but if e = d 2 , then d/e is in- 
finite; and if d = e 2 , then d/e is 
infinitesimal. Acting this out with 
numbers that are small but not in- 
finitesimal, 

.001 1 

X)37 ~ 37 

.001 

1000 



.000001 
.000001 

.001 



.001 



On the other hand, suppose I tell 
you I have an infinitesimal num- 
ber d and a finite number x, and 
I ask you to speculate about d/x. 
You know for sure that it's going to 
be infinitesimal. Likewise, you can 
be sure that x/d is infinite. These 
aren't indeterminate forms. 

We can do something similar with 
infinite numbers. If H and K are 



both infinite, then H — K is inde- 
terminate. It could be infinite, for 
example, if H was positive infinite 
and K = H/2. On the other hand, 
it could be finite if H = K + 1. 
Acting this out with big but finite 
numbers, 

1000 - 500 = 500 
1001 - 1000 = 1 



Example 42 

> If H is a positive infinite number, 
is VH + 1 - %/H- 1 finite, infinite, in- 
finitesimal, or indeterminate? 

> Trying it with a finite, big number, we 
have 



V1 000001 -V999999 
= 1.00000000020373 x 10" 3 

which is clearly a wannabe infinites- 
imal. We can verify the result using 
Inf: 



: H=l/d 

d~-l 
: sqrt(H+l)-sqrt(H-l) 

d~l/2+0. 125d~5/2+. . . 

For convenience, the first line of input 
defines an infinite number H in terms 
of the calculator's built-in infinitesimal 
d. The result has only positive powers 
of d, so it's clearly infinitesimal. 

More rigorously, we can rewrite 
the expression as v / H(^1 + ^/H - 
^J\ - 1 //-/). Since the derivative of 
the square root function ^/x evaluated 
at x = 1 is 1/2, we can approximate 
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this as 



Vh 



h*-- 


0-i 


= Vh 


'1 




1 







which is infinitesimal. 

3.5 Limits at infinity 

The definition of the limit in terms 
of infinitesimals extends immedi- 
ately to limiting processes where 
x gets bigger and bigger, rather 
than closer and closer to some fi- 
nite value. For example, the func- 
tion 3 + 1/x clearly gets closer 
and closer to 3 as a; gets bigger 
and bigger. If a is an infinite 
number, then the definition says 
that evaluating this expression at 
a + dx, where dx is infinitesimal, 
gives a result whose standard part 
is 3. It doesn't matter that a 
happens to be infinite, the defini- 
tion still works. We also note that 
in this example, it doesn't matter 
what infinite number a is; the limit 
equals 3 for any infinite a. We can 
write this fact as 



lim [ 3 



1 



where the symbol oo is to be in- 
terpreted as "nyeah nyeah, I don't 
even care what infinite number you 
put in here, I claim it will work 
out to 3 no matter what." The 
symbol oo is not to be interpreted 
as standing for any specific infinite 



number. That would be the type 
of fallacy that lay behind the bo- 
gus proof on page 30 that 1 = 1/2, 
which assumed that all infinities 
had to be the same size. 

A somewhat different example is 
the arctangent function. The arc- 
tangent of 1000 equals approxi- 
mately 1.5698, and inputting big- 
ger and bigger numbers gives an- 
swers that appear to get closer 
and closer to tt/2 w 1.5707. But 
the arctangent of -1000 is approxi- 
mately — 1.5698, i.e., very close to 
—it/2. From these numerical ob- 
servations, we conjecture that 

lim tan~ x 

x— >a 

equals tt/2 for positive infinite a, 
but — it/2 for negative infinite a. 
It would not be correct to write 



lim tan x = — 

x— >oo 2 



[wrong] 



because it does matter what infi- 
nite number we pick. Instead we 
write 



lim tan x = — 

x— ^+oo 2 



lim tan x 



X — > — DC 



Some expressions don't have this 
kind of limit at all. For exam- 
ple, if you take the sines of big 
numbers like a thousand, a million, 
etc., on your calculator, the re- 
sults are essentially random num- 
bers lying between —1 and 1. They 
don't settle down to any particular 
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value, because the sine function os- 
cillates back and forth forever. To 
prove formally that lim x _ >+00 sin a: 
is undefined, consider that the sine 
function, defined on the real num- 
bers, has the property that you 
can always change its result by at 
least 0.1 if you add either 1.5 or 
— 1.5 to its input. For example, 
sin(.8) « 0.717, and sin(.8 - 1.5) « 
—0.644. Applying the transfer 
principle to this statement, we find 
that the same is true on the hyper- 
reals. Therefore there cannot be 
any value £ that differs infinitesi- 
mally from sin a for all positive in- 
finite values of a. 

Often we're interested in finding 
the limit as x approaches infinity 
of an expression that is written as 
an indeterminate form like H/K, 
where both H and K are infinite. 
I 



Example 43 



> Evaluate the limit 



lim 



2x + 7 
x + 8686 



Another approach is to use I'Hopital's 
rule. The derivative of the top is 2, and 
the derivative of the bottom is 1 , so the 
limit is 2/1 =2. 

3.6 Generalizations 
of I'Hopital's rule 

Mathematical theorems are some- 
times like cars. I own a Honda Fit 
that is about as bare-bones as you 
can get these days, but persuad- 
ing a dealer to sell me that car 
was like pulling teeth. The sales- 
man was absolutely certain that 
any sane customer would want to 
pay an extra $1,800 for such cru- 
cial amenities as floor mats and a 
chrome tailpipe. L'Hopital's rule 
in its most general form is a much 
fancier piece of machinery than 
the stripped down model described 
on p. 60. The price you pay for 
the deluxe model is that the proof 
becomes much more complicated 
than the one-liner that sufficed for 
the simple version. 



> Intuitively, if x gets large enough the 
constant terms will be negligible, and 
the top and bottom will be dominated 
by the 2x and x terms, respectively, 
giving an answer that approaches 2. 

One way to verify this is to divide both 
the top and the bottom by x, giving 

-| , 8686 

If x is infinite, then the standard part 
of the top is 2, the standard part of the 
bottom is 1 , and the standard part of 
the whole thing is therefore 2. 



Multiple applications of the rule 

In the following example, we have 
to use I'Hopital's rule twice before 
we get an answer. 

I 



Example 44 



> Evaluate 



1 + cos X 



lim 

x^n (x - n) 2 

> Applying I'Hopital's rule gives 

— sinx 
2(x - n) 
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which still produces 0/0 when we plug 
in x = 7t. Going again, we get 

- COS X 1 



The reason that this always works 
is outlined on p. 148. 

The indeterminate form oo/oo 

Consider an example like this: 

,. 1 + 1/a; 
lim — 

x^o 1 + 2/x 

This is an indeterminate form like 
oo/oo rather than the 0/0 form 
for which we've already proved 
l'Hopital's rule. As proved on 
p. 150, l'Hopital's rule applies to 
examples like this as well. 



I 

> Evaluate 



Example 45 



lim 



1 +1/x 



o 1 + 2/x 

> Both the numerator and the de- 
nominator go to infinity. Differenti- 
ation of the top and bottom gives 
(-x- 2 )/(-2x~ 2 ) = 1/2. We can see 
that the reason the rule worked was 
that (1) the constant terms were irrel- 
evant because they become negligible 
as the 1/x terms blow up; and (2) dif- 
ferentiating the blowing-up 1/x terms 
makes them into the same x~ 2 on top 
and bottom, which cancel. 

Note that we could also have gotten 
this result without l'Hopital's rule, sim- 
ply by multiplying both the top and the 
bottom of the original expression by x 
in order to rewrite it as (x + 1 )/(x + 2). 



Limits at infinity 

It is straightforward to prove a 
variant of l'Hopital's rule that al- 
lows us to do limits at infinity. The 
general proof is left as an exercise 
(problem 8, p. 67). The result is 
that l'Hopital's rule is equally valid 
when the limit is at ±oo rather 
than at some real number a. 
I 



Example 46 



> Evaluate 



lim 



2x + 7 
x + 8686 



> We could use a change of variable 
to make this into example 37 on p. 59, 
which was solved using an ad hoc and 
multiple-step procedure. But having 
established the more general form of 
l'Hopital's rule, we can do it in one 
step. Differentiation of the top and bot- 
tom produces 



2x + 7 _ 2 _ 

x^oc x + 8686 ~ T ~ 
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Problems 

1 (a) Prove, using the Weier- 
strass definition of the limit, 
that if lim :E _>. a f(x) = F and 
lim x _> a g(x) = G both exist, them 
Yim x ^ a [f{x) + g(x)} =F + G, i.e., 
that the limit of a sum is the sum 
of the limits, (b) Prove the same 
thing using the definition of the 
limit in terms of infinitesimals. 

I> Solution, p. 180 

2 Sketch the graph of the func- 
tion e -1 ' 21 , and evaluate the follow- 
ing four limits: 

lim e~ 1/x 

x^0+ 

lim e~ 1/x 
lim e~ 1/x 
lim e- x ' x 



x — > -\- dc 



x— y — oo 



> Solution, p. 180 
3 Verify the following limits. 



s 3 
lim — 


- 1 

— r = 3 




S->1 s ■ 


- l 




lim — 
$-y0 


- cos# 
9 2 


1 
2 


5x 
lim — 


2 -2x 


00 


X— ^oo 


X 






n(n + 1) 




n^oc [n 


+ 2)(n + 


3) " 


ai 


2 + bx + 


c a 



x^oo dx 2 + ex + f d 

> Solution, p. 181 [Granville, 1911] 
4 Evaluate 



lim — 

£C-S-0 1 



exactly, and check your result by 
numerical approximation. 

> Solution, p. 181 

5 Amy is asked to evaluate 

lim — 

x^O x 

She applies l'Hopital's rule, differ- 
entiating top and bottom to find 
l/e x , which equals 1 when she 
plugs in x = 0. What is wrong 
with her reasoning? 

> Solution, p. 182 

6 Evaluate 
lim 



u^a e u + e~ u - 2 

exactly, and check your result by 
numerical approximation. 

> Solution, p. 182 

7 Evaluate 

smt 
lim 

t-»7T t — n 

exactly, and check your result by 
numerical approximation. 

> Solution, p. 182 

8 Prove a form of l'Hopital's 
rule stating that 

hm -—- 

x^roo g{x) 

is equal to the limit of /'/</ at in- 
finity. Hint: change to some new 
variable u such that x — > oo corre- 
sponds to u — > 0. 

> Solution, p. 182 

9 Prove that the linear func- 
tion y = ax + b, where a and b are 
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real, is continuous, first using the 
definition of continuity in terms of 
infinitesimals, and then using the 
definition in terms of the Weier- 
strass limit. > Solution, p. 182 



4 Integration 



4.1 Definite and 
indefinite 
integrals 

Because any formula can be differ- 
entiated symbolically to find an- 
other formula, the main motiva- 
tion for doing derivatives numeri- 
cally would be if the function to 
be differentiated wasn't known in 
symbolic form. A typical exam- 
ple might be a two-person network 
computer game, in which player 
A's computer needs to figure out 
player B's velocity based on knowl- 
edge of how her position changes 
over time. But in most cases, it's 
numerical integration that's inter- 
esting, not numerical differentia- 
tion. 

As a warm-up, let's see how to 
do a running sum of a discrete 
function using Yacas. The follow- 
ing program computers the sum 
1+2+. . . + 100 discussed to on page 
7. Now that we're writing real 
computer programs with Yacas, it 
would be a good idea to enter each 
program into a file before trying to 
run it. In fact, some of these exam- 
ples won't run properly if you just 
start up Yacas and type them in 
one line at a time. If you're using 
Adobe Reader to read this book, 
you can do Tools>Basic>Select, 
select the program, copy it into a 
file, and then edit out the line num- 



bers. 



1 




Example 47 


1 


n := 1; 




2 


sum : = ; 




3 


While (n<=100) [ 




4 


sum : = sum+n ; 




5 


n := n+l; 




6 


]; 




7 


Echo (sum) ; 





The semicolons are to separate one 
instruction from the next, and they 
become necessary now that we're 
doing real programming. Line 1 
of this program defines the vari- 
able n, which will take on all the 
values from 1 to 100. Line 2 says 
that we haven't added anything up 
yet, so our running sum is zero do 
far. Line 3 says to keep on re- 
peating the instructions inside the 
square brackets until n goes past 
100. Line 4 updates the running 
sum, and line 5 updates the value 
of n. If you've never done any pro- 
gramming before, a statement like 
n:=n+l might seem like nonsense 
- how can a number equal itself 
plus one? But that's why we use 
the : = symbol; it says that we're 
redefining n, not stating an equa- 
tion. If n was previously 37, then 
after this statement is executed, n 
will be redefined as 38. To run the 
program on a Linux computer, do 
this (assuming you saved the pro- 
gram in a file named sum. yacas): 

7, yacas -pc sum. yacas 
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5050 

Here the % symbol is the com- 
puter's prompt. The result is 
5,050, as expected. One way of 
stating this result is 



E-- 



5050 



The capital Greek letter S, sigma, 
is used because it makes the "s" 
sound, and that's the first sound in 
the word "sum." The n = 1 below 
the sigma says the sum starts at 1, 
and the 100 on top says it ends at 
100. The n is what's known as a 
dummy variable: it has no mean- 
ing outside the context of the sum. 
Figure a shows the graphical inter- 
pretation of the sum: we're adding 
up the areas of a series of rectan- 
gular strips. (For clarity, the figure 
only shows the sum going up to 7, 
rather than 100.) 



a / Graphical interpreta- 
tion of the sum 1 +2 + . . .+ 
7. 

Now how about an integral? Fig- 
ure b shows the graphical inter- 



pretation of what we're trying to 
do: find the area of the shaded 
triangle. This is an example we 
know how to do symbolically, so 
we can do it numerically as well, 
and check the answers against each 
other. Symbolically, the area is 
given by the integral. To inte- 
grate the function x(t) = t, we 
know we need some function with 
a t 2 in it, since we want something 
whose derivative is t, and differen- 
tiation reduces the power by one. 
The derivative of t 2 would be It 
rather than t, so what we want is 
x(t) = t 2 /2. Let's compute the 
area of the triangle that stretches 
along the t axis from to 100: 
x(100) = 100/2 = 5000. 




b / Graphical interpreta- 
tion of the integral of the 
function x(t) = t. 



Figure c shows how to accomplish 
the same thing numerically. We 
break up the area into a whole 
bunch of very skinny rectangles. 
Ideally, we'd like to make the width 
of each rectangle be an infinitesi- 
mal number dx, so that we'd be 
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adding up an infinite number of in- 
finitesimal areas. In reality, a com- 
puter can't do that, so we divide up 
the interval from t = to t = 100 
into H rectangles, each with fi- 
nite width dt = 100/H. Instead 
of making H infinite, we make it 
the largest number we can without 
making the computer take too long 
to add up the areas of the rectan- 
gles. 




c / Approximating the in- 
tegral numerically. 



Example 48 



1 


tmax := 100; 


2 


H := 1000; 


3 


dt : = tmax/H ; 


4 


sum : = ; 


5 


t := 0; 


6 


While (t<=tmax) [ 


7 


sum := N(sum+t*dt) ; 


8 


t := N(t+dt) ; 


9 


]; 


10 


Echo (sum) ; 



In example 48, we split the in- 
terval from t = to 100 into 
H = 1000 small intervals, each 
with width dt = 0.1. The result is 
5,005, which agrees with the sym- 



bolic result to three digits of preci- 
sion. Changing H to 10,000 gives 
5,000.5, which is one more digit. 
Clearly as we make the number 
of rectangles greater and greater, 
we're converging to the correct re- 
sult of 5,000. 

In the Leibniz notation, the thing 
we've just calculated, by two differ- 
ent techniques, is written like this: 



100 



t dt = 5, 000 



It looks a lot like the S notation, 
with the £ replaces by a flattened- 
out letter "S." The t is a dummy 
variable. What I've been casually 
referring to as an integral is re- 
ally two different but closely re- 
lated things, known as the definite 
integral and the indefinite integral. 

Definition of the indefinite integral 
If x is a function, then a function 
x is an indefinite integral of x if, as 
implied by the notation, da;/dt = 
x. 

Interpretation: Doing an indefi- 
nite integral means doing the op- 
posite of differentiation. All the 
possible indefinite integrals are the 
same function except for an addi- 
tive constant. 



Example 49 

> Find the indefinite integral of the 
function x(t) = t. 

> Any function of the form 

x(f) = t 2 /2 + c 
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where c is a constant, is an indefi- 
nite integral of this function, since its 
derivative is t. 

Definition of the definite integral 
If x is a function, then the definite 
integral of x from a to b is defined 
as 



x(t)dt 

H 
= lim y^ x (a + iAt) At 

i=0 

where At = (b — a)/H. 

Interpretation: What we're calcu- 
lating is the area under the graph 
of x, from a to b. (If the graph 
dips below the t axis, we interpret 
the area between it and the axis as 
a negative area.) The thing inside 
the limit is a calculation like the 
one done in example 48, but gen- 
eralized to a ^ 0. If H was infinite, 
then At would be an infinitesimal 
number dt. 



4.2 The fundamental 
theorem of 
calculus 



x{t)dt = x(b) — x(a) 



Interpretation: In the simple ex- 
amples we've been doing so far, we 
were able to choose an indefinite 
integral such that x(0) = 0. In 
that case, x(t) is interpreted as the 
area from to t, so in the expres- 
sion x(b) — x(a), we're taking the 
area from to o, but subtracting 
out the area from to 6, which 
gives the area from a to b. If we 
choose an indefinite integral with 
a different c, the c's will just can- 
cel out anyway in the difference 
x(b) — x(a). 



The fundamental theorem is 
proved on page 150. 



Example 50 
t> Interpret the indefinite integral 



, 1 



df 



graphically; then evaluate it it both 
symbolically and numerically, and 
check that the two results are consis- 
tent. 




> Figure d shows the graphical inter- 
pretation. The numerical calculation 
requires a trivial variation on the pro- 
gram from example 48: 



1; 
2; 
1000; 
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d / The indefinite integral 
f?(VW 

dt := (b-a)/H; 
sum : = ; 
t := a; 
While (t<=b) [ 

sum := N(sum+(l/t)*dt) ; 

t := N(t+dt) ; 
]; 
Echo (sum) ; 

The result is 0.693897243, and 
increasing H to 10,000 gives 
0.6932221811, so we can be 
fairly confident that the result equals 
0.693, to 3 decimal places. 

Symbolically, the indefinite integral is 
x = In t. Using the fundamental the- 
orem of calculus, the area is In 2 - 
In 1 « 0.693147180559945. 

Judging from the graph, it looks plau- 
sible that the shaded area is about 
0.7. 

This is an interesting example, be- 
cause the natural log blows up to neg- 
ative infinity as t approaches 0, so it's 
not possible to add a constant onto 
the indefinite integral and force it to be 
equal to at t - 0. Nevertheless, the 
fundamental theorem of calculus still 
works. 



4.3 Properties of the 
integral 

Let / and g be two functions of x, 
and let c be a constant. We already 
know that for derivatives, 

d d/ dg 

dx dx dx 



and 



dx 



(of) 



dx 



But since the indefinite integral is 
just the operation of undoing a 
derivative, the same kind of rules 
must hold true for indefinite inte- 
grals as well: 



(f + g)dx 



and 



(cf)dx 



fdx 



fdx 



gdx 



And since a definite integral can be 
found by plugging in the upper and 
lower limits of integration into the 
indefinite integral, the same prop- 
erties must be true of definite inte- 
grals as well. 

Example 51 
> Evaluate the indefinite integral 



/ (x + 2sinx)dx 



> Using the additive property, the inte- 
gral becomes 



xdx + / 2sinxdx 
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Then the property of scaling by a con- 
stant lets us change this to 



xdx + 2 / sin xdx 



We need a function whose derivative 
is x, which would be x 2 /2, and one 
whose derivative is sinx, which must 
be - cos x, so the result is 



1 2 
2 X 



2cosx + c 



4.4 Applications 

Averages 

In the story of Gauss's problem of 
adding up the numbers from 1 to 
100, one interpretation of the re- 
sult, 5,050, is that the average of 
all the numbers from 1 to 100 is 
50.5. This is the ordinary defini- 
tion of an average: add up all the 
things you have, and divide by the 
number of things. (The result in 
this example makes sense, because 
half the numbers are from 1 to 50, 
and half are from 51 to 100, so the 
average is half-way between 50 and 
51.) 

Similarly, a definite integral can 
also be thought of as a kind of aver- 
age. In general, if y is a function of 
x, then the average, or mean, value 
of y on the interval from x = a to 
b can be defined as 



1 



y 



y dx 



In the continuous case, dividing by 
b — a accomplishes the same thing 



as dividing by the number of things 
in the discrete case. 



Example 52 

> Show that the definition of the aver- 
age makes sense in the case where 
the function is a constant. 

> If y is a constant, then we can take 
it outside of the integral, so 



y = 



1 



b- a 

1 

: b~^ a 

1 

: b~^ a 

y 



y I 1 dx 

y(b-a) 



Example 53 
> Find the average value of the func- 
tion y = x 2 for values of x ranging from 
to 1 . 



1 -0 

1 3 ' 

3* 



x 2 dx 



The mean value theorem 
If the continuous function y(x) has 
the average value y on the inter- 
val from x = a to b, then y at- 
tains its average value at least once 
in that interval, i.e., there exists £ 
with a < £ < b such that y(£) = y. 
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The mean value theorem is proved 
on page 157. The special case in 
which y = is known as Rolle's 
theorem. 

Example 54 

> Verify the mean value theorem for 
y = x 2 on the interval from to 1 . 

> The mean value is 1/3, as shown in 
example 53. This value is achieved 
at x = ^/T/3 = 1/V3, which lies be- 
tween and 1 . 



Work 



w . 



Fdx 



kxdx 



l" 
W 



The reason W grows like a 2 , not just 
like a, is that as the spring is com- 
pressed more, more and more effort 
is required in order to compress it. 



In physics, work is a measure of 
the amount of energy transferred 
by a force; for example, if a horse 
sets a wagon in motion, the horse's 
force on the wagon is putting some 
energy of motion into the wagon. 
When a force F acts on an ob- 
ject that moves in the direction of 
the force by an infinitesimal dis- 
tance da;, the infinitesimal work 
done is dW = Fdx. Integrating 
both sides, we have W = Fdx, 
where the force may depend on x, 
and a and b represent the initial 
and final positions of the object. 



Example 55 
> A spring compressed by an amount 
x relative to its relaxed length provides 
a force F = kx. Find the amount of 
work that must be done in order to 
compress the spring from x = to 
x = a. (This is the amount of energy 
stored in the spring, and that energy 
will later be released into the toy bul- 
let.) 



Probability 

Mathematically, the probability 
that something will happen can be 
specified with a number ranging 
from to 1, with representing im- 
possibility and 1 representing cer- 
tainty. If you flip a coin, heads and 
tails both have probabilities of 1/2. 
The sum of the probabilities of all 
the possible outcomes has to have 
probability 1. This is called nor- 
malization. 



So far we've discussed random pro- 
cesses having only two possible 
outcomes: yes or no, win or lose, 
on or off. More generally, a ran- 
dom process could have a result 
that is a number. Some processes 
yield integers, as when you roll a 
die and get a result from one to 
six, but some are not restricted to 
whole numbers, e.g., the height of 
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e / Normalization: the 
probability of picking 
land plus the probability 
of picking water adds up 
to 1. 

a human being, or the amount of 
time that a uranium-238 atom will 
exist before undergoing radioactive 
decay. The key to handling these 
continuous random variables is the 
concept of the area under a curve, 
i.e., an integral. 
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value coming up must be 1/6. We 
can summarize this in a graph, f. 
Areas under the curve can be inter- 
preted as total probabilities. For 
instance, the area under the curve 
from 1 to 3 is 1/6+1/6+1/6 = 1/2, 
so the probability of getting a re- 
sult from 1 to 3 is 1/2. The func- 
tion shown on the graph is called 
the probability distribution. 



f / Probability distribution for the result 
of rolling a single die. 



Consider a throw of a die. If the die 
is "honest," then we expect all six 
values to be equally likely. Since all 
six probabilities must add up to 1, 
then probability of any particular 




g / Rolling two dice and adding them 
up. 



Figure g shows the probabilities of 
various results obtained by rolling 
two dice and adding them to- 
gether, as in the game of craps. 
The probabilities are not all the 
same. There is a small probability 
of getting a two, for example, be- 
cause there is only one way to do it, 
by rolling a one and then another 
one. The probability of rolling a 
seven is high because there are six 
different ways to do it: 1+6, 2+5, 
etc. 
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If the number of possible outcomes 
is large but finite, for example the 
number of hairs on a dog, the 
graph would start to look like a 
smooth curve rather than a ziggu- 
rat. 

What about probability distribu- 
tions for random numbers that are 
not integers? We can no longer 
make a graph with probability on 
the y axis, because the probabil- 
ity of getting a given exact num- 
ber is typically zero. For instance, 
there is zero probability that a per- 
son will be exactly 200 cm tall, 
since there are infinitely many pos- 
sible results that are close to 200 
but not exactly two, for exam- 
ple 199.99999999687687658766. It 
doesn't usually make sense, there- 
fore, to talk about the probability 
of a single numerical result, but it 
does make sense to talk about the 
probability of a certain range of re- 
sults. For instance, the probability 
that a randomly chosen person will 
be more than 170 cm and less than 
200 cm in height is a perfectly rea- 
sonable thing to discuss. We can 
still summarize the probability in- 
formation on a graph, and we can 
still interpret areas under the curve 
as probabilities. 

But the y axis can no longer be a 
unitless probability scale. In the 
example of human height, we want 
the x axis to have units of meters, 
and we want areas under the curve 
to be unitless probabilities. The 
area of a single square on the graph 




o 



120 140 160 180 200 

height (cm) 



h / A probability distribution for human 
height. 

paper is then 

(unitless area of a square) 
= (width of square 
with distance units) 
x (height of square) 

If the units are to cancel out, then 
the height of the square must ev- 
idently be a quantity with units 
of inverse centimeters. In other 
words, the y axis of the graph is 
to be interpreted as probability per 
unit height, not probability. 

Another way of looking at it is that 
the y axis on the graph gives a 
derivative, dP/dx: the infinites- 
imally small probability that x 
will lie in the infinitesimally small 
range covered by da;. 

Example 56 
> A computer language will typically 
have a built-in subroutine that pro- 
duces a fairly random number that 
is equally likely to take on any value 
in the range from to 1. If you 
take the absolute value of the differ- 
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ence between two such numbers, the 
probability distribution is of the form 
dP/dx = /<(1 - x). Find the value of 
the constant k that is required by nor- 
malization. 



used to relate their probability dis- 
tributions. 



/c(1 - x) dx 



kx 



*" 



= k-k/2 
k = 2 




Self-Check 

Compare the number of people with 
heights in the range of 130-135 cm to 
the number in the range 135-140. t> 
Answer, p. 161 



j / Example 57. 




i / The average can be interpreted as 
the balance point of the probability dis- 
tribution. 



Example 57 
> A laser is placed one meter away 
from a wall, and spun on the ground 
to give it a random direction, but if 
the angle u shown in figure j doesn't 
come out in the range from to n/2, 
the laser is spun again until an an- 
gle in the desired range is obtained. 
Find the probability distribution of the 
distance x shown in the figure. The 
derivative d tan -1 z/dz = 1/(1 +z 2 ) will 
be required (see example 63, page 
86). 



> Since any angle between and n/2 
is equally likely, the probability distri- 
bution dP/du must be a constant, and 
normalization tells us that the constant 
must be dP/du = 2/n. 



When one random variable is re- 
lated to another in some mathe- 
matical way, the chain rule can be 



The laser is one meter from the wall, 
so the distance x, measured in me- 
ters, is given by x = tan u. For the 
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probability distribution of x, we have 

dP_dP du 
dx ~ du dx 
_ 2 dtan" 1 x 

~ n dx 



7t(1 + X 2 ) 

Note that the range of possible values 
of x theoretically extends from to in- 
finity. Problem 7 on page 102 deals 
with this. 

If the next Martian you meet asks 
you, "How tall is an adult hu- 
man?," you will probably reply 
with a statement about the average 
human height, such as "Oh, about 
5 feet 6 inches." If you wanted to 
explain a little more, you could say, 
"But that's only an average. Most 
people are somewhere between 5 
feet and 6 feet tall." Without 
bothering to draw the relevant bell 
curve for your new extraterrestrial 
acquaintance, you've summarized 
the relevant information by giving 
an average and a typical range of 
variation. The average of a prob- 
ability distribution can be defined 
geometrically as the horizontal po- 
sition at which it could be balanced 
if it was constructed out of card- 
board, i. This is a different way 
of working with averages than the 
one we did earlier. Before, had 
a graph of y versus x, we implic- 
itly assumed that all values of x 
were equally likely, and we found 
an average value of y. In this new 
method using probability distribu- 
tions, the variable we're averaging 



is on the x axis, and the y axis tells 
us the relative probabilities of the 
various x values. 

For a discrete-valued variable with 
n possible values, the average 
would be 



E ip w 



4 = 



and in the case of a continuous 
variable, this becomes an integral, 



X dx 

ax 



Example 58 
> For the situation described in exam- 
ple 56, find the average value of x. 



1 dP, 
x — dx 
dx 



f x-2(1 -x)dx 
Jo 



2 / (x - x") dx 

'0 



\2 3 

1 
3 



Sometimes we don't just want to 
know the average value of a cer- 
tain variable, we also want to have 
some idea of the amount of varia- 
tion above and below the average. 
The most common way of measur- 
ing this is the standard deviation, 
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defined by 



6 _ dP 

(x — x) 2 — — da; 
da; 



The idea here is that if there was 
no variation at all above or be- 
low the average, then the quantity 
(x — x) would be zero whenever 
dP/dx was nonzero, and the stan- 
dard deviation would be zero. The 
reason for taking the square root 
of the whole thing is so that the 
result will have the same units as 
a;. 

Example 59 

> For the situation described in exam- 
ple 56, find the standard deviation of 
x. 

> The square of the standard deviation 
is 



2 f\ -, 2 dP. 

<r = / (x - x) 3- dx 

h dx 

= f (x-1/3) 2 -2(1 -x)dx 
Jo 

1 

~ 18 
so the standard deviation is 

1 

ff= —i= 
vT8 

« 0.236 



PROBLEMS 



Hi 



Problems 

1 Write a computer program 

similar to the one in example 50 
on page 72 to evaluate the definite 
integral 



> Solution, p. 183 

2 Evaluate the integral 

sin x dx , 

and draw a sketch to explain why 
your result comes out the way it 
does. > Solution, p. 183 

3 Sketch the graph that repre- 
sents the definite integral 



+ 2x 



and estimate the result roughly 
from the graph. Then evaluate the 
integral exactly, and check against 
your estimate. 

> Solution, p. 184 

4 Make a rough guess as to the 
average value of sin x for < x < 
7r, and then find the exact result 
and check it against your guess. 

> Solution, p. 185 

5 Show that the mean value the- 
orem's assumption of continuity is 
necessary, by exhibiting a discon- 
tinuous function for which the the- 
orem fails. > Solution, p. 185 

6 Show that the fundamental 
theorem of calculus's assumption 



of continuity for x is necessary, by 
exhibiting a discontinuous function 
for which the theorem fails. 

> Solution, p. 185 

7 Sketch the graphs of y = x 1 
and y = yfx for < x < 1. Graph- 
ically, what relationship should ex- 
ist between the integrals L x 2 dx 

and L yfx dx? Compute both in- 
tegrals, and verify that the results 
are related in the expected way. 

8 In a gasoline-burning car en- 
gine, the exploding air-gas mixture 
makes a force on the piston, and 
the force tapers off as the piston 
expands, allowing the gas to ex- 
pand, (a) In the approximation 
F = k/x, where x is the position 
of the piston, find the work done 
on the piston as it travels from 
x = a to x = b, and show that 
the result only depends on the ra- 
tio b/a. This ratio is known as 
the compression ratio of the en- 
gine, (b) A better approximation, 
which takes into account the cool- 
ing of the air-gas mixture as it ex- 
pands, is F = kx~ 1A . Compute 
the work done in this case. 



9 A certain variable x varies 
randomly from -1 to 1, with 
probability distribution dP/dx = 
k(l-x 2 ). 

(a) Determine k from the require- 
ment of normalization. 

(b) Find the average value of x. 

(c) Find its standard deviation. 
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Problem 8. 

10 Suppose that we've already 
established that the derivative of 
an odd function is even, and vice 
versa. (See problem 28, p. 49.) 
Something similar can be proved 
for integration. However, the fol- 
lowing is not quite right. 

Let f be even, and let g = J f(x)dx 
be its indefinite integral. Then by 
the fundamental theorem of calcu- 
lus, f is the derivative of g. Since 
we've already established that the 
derivative of an odd function is 
even, we conclude that g is odd. 

Find all errors in the proof. 

> Solution, p. 185 

11 A perfectly elastic ball 
bounces up and down forever, al- 
ways coming back up to the same 
height h. Find its average height. 



Problem 12. 

the old one. Prove Holditch's the- 
orem, which states that the new 
curve's area differs from the old 
one's by n. (This is an example 
of a result that is much more dif- 
ficult to prove without making use 
of infinitesimals.) * 



12 The figure shows a curve with 
a tangent line segment of length 1 
that sweeps around it, forming a 
new curve that is usually outside 



5 Techniques 



5.1 Newton's method 

In the 1958 science fiction novel 
Have Space Suit — Will 
Travel, by Robert Heinlein, Kip 
is a high school student who wants 
to be an engineer, and his father is 
trying to convince him to stretch 
himself more if he wants to get any- 
thing out of his education: 

"Why did Van Buren fail of re- 
election? How do you extract the 
cube root of eighty-seven?" 

Van Buren had been a president; 
that was all I remembered. But I 
could answer the other one. "If 
you want a cube root, you look in 
a table in the back of the book. " 

Dad sighed. "Kip, do you think 
that table was brought down from 
on high by an archangel?" 

We no longer use tables to com- 
pute roots, but how does a pocket 
calculator do it? A technique 
called Newton's method allows us 
to calculate the inverse of any func- 
tion efficiently, including cases that 
aren't preprogrammed into a cal- 
culator. In the example from the 
novel, we know how to calculate 
the function y = x 3 fairly accu- 
rately and quickly for any given 
value of x, but we want to turn the 
equation around and find x when 
y = 87. We start with a rough 
mental guess: since 4 3 = 64 is a lit- 



tle too small, and 5 3 = 125 is much 
too big, we guess x w 4.3. Test- 
ing our guess, we have 4.3 3 = 79.5. 
We want y to get bigger by 7.5, and 
we can use calculus to find approx- 
imately how much bigger x needs 
to get in order to accomplish that: 

AX A 

Ax = — Ay 
Ay 

dx . 
« — Ay 

ay 

= Ay 

dy/dx 
= Ay 

3x 2 
= Ay 

3x 2 
= 0.14 

Increasing our value of x to 4.3 + 
0.14 = 4.44, we find that 4.44 3 = 
87.5 is a pretty good approxima- 
tion to 87. If we need higher preci- 
sion, we can go through the process 
again with Ay = —0.5, giving 

Ay 
3a; 2 
0.14 



Ax: 



X 
„3 



4.43 
86.9 



This second iteration gives an ex- 
cellent approximation. 



Example 60 
> Figure 60 shows the astronomer Jo- 
hannes Kepler's analysis of the motion 
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and we want to find x when y = 
2n/4 = 1 .57. As a first guess, we try 
x = n/2 (90 degrees), since the ec- 
centricity of Mercury's orbit is actually 
much smaller than the example shown 
in the figure, and therefore the planet's 
speed doesn't vary all that much as it 
goes around the sun. For this value of 
x we have y = 1 .36, which is too small 
by 0.21. 



Ax; 



Ay 
dy/dx 



0.21 



a / Example 60. 



1 - (0.206) cos x 
= 0.21 



of the planets. The ellipse is the or- 
bit of the planet around the sun. At 
t = 0, the planet is at its closest ap- 
proach to the sun, A. At some later 
time, the planet is at point B. The an- 
gle x (measured in radians) is defined 
with reference to the imaginary circle 
encompassing the orbit. Kepler found 
the equation 

2n— = x — esmx 

where the period, T, is the time re- 
quired for the planet to complete a full 
orbit, and the eccentricity of the el- 
lipse, e, is a number that measures 
how much it differs from a circle. The 
relationship is complicated because 
the planet speeds up as it falls inward 
toward the sun, and slows down again 
as it swings back away from it. 

The planet Mercury has e = 0.206. 
Find the angle x when Mercury has 
completed 1 /4 of a period. 

> We have 



(The derivative dy/dx happens to be 
1 at x = n/2.) This gives a new value 
of x, 1. 57+. 21 =1.78. Testing it, we 
have y = 1.58, which is correct to 
within rounding errors after only one 
iteration. (We were only supplied with 
a value of e accurate to three signifi- 
cant figures, so we can't get a result 
with precision better than about that 
level.) 

5.2 Implicit 

differentiation 

We can differentiate any function 
that is written as a formula, and 
find a result in terms of a formula. 
However, sometimes the original 
problem can't be written in any 
nice way as a formula. For exam- 
ple, suppose we want to find dy/dx 
in a case where the relationship be- 
tween x and y is given by the fol- 
lowing equation: 



(0.206) sin x 



y 7 + y = x 7 ■ 



5.3. METHODS OF INTEGRATION 



85 



There is no equivalent of the 
quadratic formula for seventh- 
order polynomials, so we have no 
way to solve for one variable in 
terms of the other in order to dif- 
ferentiate it. However, we can still 
find dy/dx in terms of x and y. 
Suppose we let x grow to x + dx. 
Then for example the x 2 term will 



grow to (x + dx) z 



- 2dx + dx 2 



The squared infinitesimal is negli- 
gible, so the increase in x 2 was re- 
ally just 2dx, and we've really just 
computed the derivative of x 2 with 
respect to x and multiplied it by 
dx. In symbols, 



d(x 2 



d{x 2 ) 

da; 
2x dx 



dx 



5.3 Methods of 
integration 

Change of variable 



Sometimes an unfamiliar-looking 
integral can be made into a famil- 
iar one by substituting a a new 
variable for an old one. For exam- 
ple, we know how to integrate 1/x 
- the answer is In a; — but what 
about 



dx 



2x + 1 



Let u = 2x + 1. Differentiating 
both sides, we have du = 2dx, or 
da; = du/2, so 



That is, the change in x 2 is 2a; 
times the change in x. Doing this 
to both sides of the original equa- 
tion, we have 



d(y 7 + y) = 

7y 6 dy + 1 dy = 

(7y 6 + l)dy = 

dy 7x 6 

dx 7y 6 



d(x 7 + x 2 ) 
7x dx + 2x dx 
(7a; 6 + 2x)dx 
+ 2x 



1 



dx 



2x + 1 



du/2 



lnu 



ln(2x + 1) + c 



This technique is known as a 
change of variable or a substitu- 
tion. (Because the letter u is of- 
ten employed, you may also see it 
called M-substitution.) 



This still doesn't give us a for- 
mula for the derivative in terms of 
x alone, but it's not entirely use- 
less. For instance, if we're given 
a numerical value of x, we can al- 
ways use Newton's method to find 
y, and then evaluate the derivative. 



In the case of a definite integral, 
we have to remember to change the 
limits of integration to reflect the 
new variable. 



Evaluate / 4 dx/(2x + 1). 



Example 61 
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> As before, let u = 2x + 1 . 



dx 
2x + 1 



du/2 



Inu 



u=9 



Here the notation \ u u z 7 means to eval- 
uate the function at 7 and 9, and sub- 
tract the former from the latter. The 
result is 



■J x= 



dx 
2x + 1 



(In 9 -In 7) 
9 



2S 



Sometimes, as in the next example, 
a clever substitution is the secret to 
doing a seemingly impossible inte- 
gral. 



r 



Example 62 



> Evaluate 



rdX 



> The only hope for reducing this to a 
form we can do is to let u = -Jx. Then 
dx = d(u 2 ) = 2u6u, so 



e dx= / — -2udu 
u 



2 / e u 6u 

2e u 
2e^ 



Example 62 really isn't so tricky, 
since there was only one logical 
choice for the substitution that had 



any hope of working. The follow- 
ing is a little more dastardly. 



r 



Example 63 



> Evaluate 



dx 
1 +x 2 



> The substitution that works is x = 
tan u. First let's see what this does 
to the expression 1 + x 2 . The familiar 
identity 



sin 2 u + cos 2 u = 1 



when divided by cos 2 u, gives 



tan 2 u + 1 = sec 2 u 



so 1 + x 2 becomes sec 2 u. But differ- 
entiating both sides of x = tan u gives 

dx = d sin u(cosu) -1 
= (dsinu)(cosu)~ 1 
+ (sinu)d (cosu) -1 

= M +tan 2 u) du 
= sec 2 udu 
so the integral becomes 

dx f sec 2 u6u 



1 + x 2 J sec 2 u 
= u + c 
= tan" 1 x + c 



What mere mortal would ever 
have suspected that the substitu- 
tion x = tanw was the one that 
was needed in example 63? One 
possible answer is to give up and 
do the integral on a computer: 
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Integrate(x) l/(l+x~2) 
A re Tan (x) 



Another possible answer is that 
you can usually smell the pos- 
sibility of this type of substitu- 
tion, involving a trig function, 
when the thing to be integrated 
contains something reminiscent of 
the Pythagorean theorem, as sug- 
gested by figure b. The 1 + x 2 
looks like what you'd get if you 
had a right triangle with legs 1 and 
x, and were using the Pythagorean 
theorem to find its hypotenuse. 




b / The substitution x 
tan u. 



> Evaluate f dx/Vl - x 2 



Example 64 



> The V1 - x 2 looks like what you'd 
get if you had a right triangle with 
hypotenuse 1 and a leg of length x, 
and were using the Pythagorean the- 
orem to find the other leg, as in fig- 
ure c. This motivates us to try the 
substitution x = cos u, which gives 
dx = -sinu du and V1 - x 2 = 
a/1 - cos 2 u = sin u. The result is 



dx 



%/T 



sin u6u 
sinu 




c / The substitution x = 
cosu. 

Integration by parts 

Figure d shows a technique called 
integration by parts. If the inte- 
gral J vdu is easier than the inte- 
gral J udv, then we can calculate 
the easier one, and then by sim- 
ple geometry determine the one we 
wanted. Identifying the large rect- 
angle that surrounds both shaded 
areas, and the small white rectan- 
gle on the lower left, we have 

/ u dv =(area of large rectangle) 

— (area of small rectangle) 
V du 



In the case of an indefinite integral, 
we have a similar relationship de- 
rived from the product rule: 

d(uv) = u dv + v du 
u dv = d(uv) — v du 

Integrating both sides, we have the 
following relation. 

Integration by parts 



COS 



u dv = uv 



v du 
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d / Integration by parts. 



> There are two obvious possibilities 
for splitting up the integrand into fac- 
tors, 

u6v = (x)(cosxdx) 



or 



udv = (cosx){x6x) 

The first one is the one that lets us 
*- v make progress. If u = x, then 6u = dx, 
and if dv = cosx dx, then integration 
gives v = sinx. 



Since a definite integral can al- 
ways be done by evaluating an in- 
definite integral at its upper and 
lower limits, one usually uses this 
form. Integrals don't usually come 
prepackaged in a form that makes 
it obvious that you should use inte- 
gration by parts. What the equa- 
tion for integration by parts tells 
us is that if we can split up the 
integrand into two factors, one of 
which (the dv) we know how to 
integrate, we have the option of 
changing the integral into a new 
form in which that factor becomes 
its integral, and the other fac- 
tor becomes its derivative. If we 
choose the right way of splitting up 
the integrand into parts, the result 
can be a simplification. 



xcosxdx = / udv 



uv — I vdu 

= xsinx- / sinxdx 
= xsinx + cosx 

Of the two possibilities we consid- 
ered for u and dv, the reason this 
one helped was that differentiating x 
gave dx, which was simpler, and in- 
tegrating cosxdx gave sinx, which 
was no more complicated than be- 
fore. The second possibility would 
have made things worse rather than 
better, because integrating xdx would 
have given x 2 /2, which would have 
been more complicated rather than 
less. 



r 



i 

> Evaluate 



Example 65 > Evaluate / In x dx. 



Example 66 



i 



xcosxdx 



d> This one is a little tricky, because it 
isn't explicitly written as a product, and 
yet we can attack it using integration 
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cS<) 



by parts. Let u = In x and 6v = 6x. 



Partial fractions 



Inxdx = / udv 



uv 



= xlnx 



vdu 



dx 



xlnx — x 



> Evaluate Jx 2 e x dx. 



Example 67 



> Integration by parts lets us split 
the integrand into two factors, inte- 
grate one, differentiate the other, and 
then do that integral. Integrating or 
differentiating e" does nothing. In- 
tegrating x 2 increases the exponent, 
which makes the problem look harder, 
whereas differentiating x 2 knocks the 
exponent down a step, which makes it 
look easier. Let u = x 2 and 6v = e"6x, 



so that 6u 
then have 



2xdx and v 



We 



/■ 



Vdx 



xV 



2 / xe x dx 



Although we don't immediately know 
how to evaluate this new integral, we 
can subject it to the same type of inte- 
gration by parts, now with u = x and 
dv = e x dx. After the second integra- 
tion by parts, we have: 



xVdx = xV-2 ixe" 



e x dx 



xV - 2 (xe x - e" 
(x 2 -2x + 2)e x 



Given a function like 

-1 1 

+ 



x — 1 



x 



1 



we can rewrite it over a common 
denominator like this: 



-1 



x — 1 



1 



x+1 
-x- 1 



x + 1 

x + 1 
x — 1 
x- 1 
x- 1 



(x-l)(x + l) 
-2 



x 2 — 1 

But note that the original form is 
easily integrated to give 



-1 



1 



dx 



x — 1 x + 1, 

= -\n(x-l) + ln(x + l)+c , 

while faced with the form 
— 2/(x 2 — 1), we wouldn't have 
known how to integrate it. 

Note that the original function was 
of the form (-1)/ . . . + (+1)/ . . . 
It's not a coincidence that the two 
constants on top, —1 and +1, are 
opposite in sign but equal in abso- 
lute value. To see why, consider 
the behavior of this function for 
large values of x. Looking at the 
form —l/(x — 1) + l/(x + 1), we 
might naively guess that for a large 
value of x such as 1000, it would 
come out to be somewhere on the 
order thousandths. But looking at 
the form —2/(x 2 — 1), we would 
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expect it to be way down in the 
millionths. This seeming paradox 
is resolved by noting that for large 
values of x, the two terms in the 
form — l/(x — 1) + l/(x + 1) very 
nearly cancel. This cancellation 
could only have happened if the 
constants on top were opposites 
like plus and minus one. 

The idea of the method of partial 
fractions is that if we want to do 
an integral of the form 

dx 

where P{x) is an nth order polyno- 
mial, we rewrite l/P as 

1 Ai A n 



P(x) 



n 



where r\ . . . r n are the roots of the 
polynomial, i.e., the solutions of 
the equation P(r) = 0. If the poly- 
nomial is second-order, you can 
find the roots r\ and r 2 using 
the quadratic formula; I'll assume 
for the time being that they're 
real. For higher-order polynomi- 
als, there is no surefire, easy way 
of finding the roots by hand, and 
you'd be smart simply to use com- 
puter software to do it. In Yacas, 
you can find the real roots of a 
polynomial like this: 

FindRealRoots (x~4-5*x~3 
-25*x~2+65*x+84) 
{3.,7.,-4.,-l.} 

(I assume it uses Newton's method 
to find them.) The constants A, 



can then be determined by algebra, 
or by the following trick. 

Numerical method 

Suppose we evaluate l/P(x) for a 
value of x very close to one of the 
roots. In the example of the poly- 
nomial x* — 5x 3 — 25x 2 + 65x + 
84, let T\ . . . r4 be the roots in 
the order in which they were re- 
turned by Yacas. Then A\ can 
be found by evaluating \jP(x) at 
x = 3.000001: 

P(x) :=x~4-5*x~3-25*x~2 

+65*x+84 
N(l/P(3. 000001)) 

-8928.5702094768 

We know that for x very close to 
3, the expression 

1 A x A 2 A 3 A 4 

- = — + — — + — — + — — 

P x-3 x-7 x+4 x+l 



will be dominated by the A\ term, 

so 



-8930 
A x w (-8930)(10~ b ) 



3.000001 - 3 

-6\ 



By the same method we can find 
the other four constants: 

dx:=. 000001 
N(l/P(7+dx),30)*dx 

.2840908276 e-2 
N(l/P(-4+dx),30)*dx 

-0.4329006192e.-2 
N(l/P(-l+dx),30)*dx 

0.1 04 1 666664 e_i 
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(The N( ,30) construct is to tell 
Yacas to do a numerical calcula- 
tion rather than an exact symbolic 
one, and to use 30 digits of pre- 
cision, in order to avoid problems 
with rounding errors.) Thus, 

1 _ -8.93 x 10" 3 
P ~ 







x — 3 




+ 


2.84 


x 10" 


-3 


X 


-7 






4.33 


x 10" 


-3 




X 


+ 4 






1.04 


x 10" 


-2 



x - 



■ 1 



The desired integral is 



dx 
P(x) 



-8.93 x 10~ 3 ln(x-3) 
+ 2.84x 10~ 3 ln(x- 7) 



4.33 x 10~ 3 ln(x + 4) 
1.04 x 10~ 2 ln(x + l) 
c 



As in the simpler example I started 
off with, where P was second or- 
der and we got A\ = —A 2 , in this 
n = A example we expect that 
M + A 2 + A 3 + A 4 = 0, for oth- 
erwise the large- a; behavior of the 
partial- fraction form would be 1/x 
rather than 1/x 4 . This is a useful 
way of checking the result: —8.93+ 
2.84 - 4.33 + 10.4 = -.02 w 0. 



Complications 



First, the same factor may occur 
more than once, as in x 3 — 5x 2 + 
7x - 3 = (x - l)(x - l)(x - 3). In 
this example, we have to look for 
an answer of the form A/(x — 1) + 
S/(x-l) 2 + C/(x-3), the solution 
being -.25/(x - 1) - .5/(x - l) 2 + 
.25/(x-3). 

Second, the roots may be complex. 
This is no show-stopper if you're 
using computer software that han- 
dles complex numbers gracefully. 
(You can choose a c that makes the 
result real.) In fact, as discussed in 
section 8.3, some beautiful things 
can happen with complex roots. 
But as an alternative, any polyno- 
mial with real coefficients can be 
factored into linear and quadratic 
factors with real coefficients. For 
each quadratic factor Q(x), we 
then have a partial fraction of the 
form (A + Bx)/Q(x), where A and 
B can be determined by algebra. 
In Yacas, this can be done using 
the Apart function. 



Example 68 



> Evaluate the integral 
dx 



(x 4 - 8x 3 + 8x 2 - 8x + 7 



using the method of partial fractions. 

> First we use Yacas to look for real 
roots of the polynomial: 

FindRealRoots (x~4-8*x~3 
+8*x~2-8*x+7) 
{l.,7.} 



There are two possible complica- 
tions: 



Unfortunately this polynomial seems 
to have only two real roots; the rest 
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are complex. We can divide out the 
factor (x - 1)(x - 7), but that still 
leaves us with a second-order polyno- 
mial, which has no real roots. One ap- 
proach would be to factor the polyno- 
mial into the form (x - 1)(x - 7)(x - 
p)(x - q), where p and q are complex, 
as in section 8.3. Instead, let's use Ya- 
cas to expand the integrand in terms 
of partial fractions: 

Apart (l/(x~4-8*x~3 
+8*x~2-8*x+7)) 
((2*x)/25+3/50)/(x -2+1) 
+l/(300*(x-7)) 
+ (-l)/(12*(x-l)) 

We can now rewrite the integral like 
this: 



2 f xdx 



25 J 


' x 2 + 1 


+ 50J 


/• dx 

' x 2 + 1 


1 


/" dx 


+ 300 


J x-7 


1 


f dx 


12 J 


' x-1 


which we can evaluate as follows: 


y~ 


i(^ + 1) 


3 . ' 

+ — tan x 
50 


1 
+ 300 


ln(x - 7) 


4' 


1(X-1) 


+c 





In fact, Yacas should be able to do 
the whole integral for us from scratch, 
but it's best to understand how these 



things work under the hood, and to 
avoid being completely dependent on 
one particular piece of software. As 
an illustration of this gem of wisdom, 
I found that when I tried to make Ya- 
cas evaluate the integral in one gulp, 
it choked because the calculation be- 
came too complicated! Because I un- 
derstood the ideas behind the proce- 
dure, I was still able to get a result 
through a mixture of computer calcu- 
lations and working it by hand. Some- 
one who didn't have the knowledge of 
the technique might have tried the in- 
tegral using the software, seen it fail, 
and concluded, incorrectly, that the in- 
tegral was one that simply couldn't be 
done. A computer is no substitute for 
understanding. 



Residue method 

On p. 90 I introduced the trick of 
carrying out the method of par- 
tial fractions by evaluating \/P(x) 
numerically at x = fj + e, near 
where 1/P blows up. Sometimes 
we would like to have an exact re- 
sult rather than a numerical ap- 
proximation. We can accomplish 
this by using an infinitesimal num- 
ber da; rather than a small but fi- 
nite e. For simplicity, let's assume 
that all of the n roots j-j are dis- 
tinct, and that P'a highest-order 
term is x n . We can then write P 
as the product P(x) = {x — r\)(x — 
r2) ■ ■ ■ (x — r n ). For products like 
this, there is a notation II (capital 
Greek letter "pi") that works like 
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£ does for sums: 

n 

P(x) = Y[(x-n) 

It's not necessary that the roots be 
real, but for now we assume that 
they are. We want to find the co- 
efficients Aj such that 



1 



S^ 



P(x) ' — ' X — Ti 

We then have 

1 



P(r % + dx) 
1 



1 

dx 

where . . . represents finite terms 
that are negligible compared to the 
infinite ones. Multiplying on both 
sides by dx, we have 

1 



P'{n) 



where the . . . now stand for in- 
finitesimals which must in fact can- 
cel out, since both A t and 1/P' are 
real numbers. 



Example 69 
t> The partial-fraction decomposition 
of the function 

1 

x 4 - 5x 3 - 25x 2 + 65x + 84 

was found numerically on p. 90. The 
coefficient of the 1/(x - 3) term 



was found numerically to be Ai « 
-8.930 x 10" 3 . Determine it exactly 
using the residue method. 

> Differentiation gives P'(x) = 4x 3 - 
15x 2 - 50x + 65. We then have A-, = 
1/P' (3) = -1/112. 



Integrals that can't be done 

Integral calculus was invented in 
the age of powdered wigs and harp- 
sichords, so the original emphasis 
was on expressing integrals in a 
form that would allow numbers to 
be plugged in for easy numerical 
evaluation by scribbling on scraps 
of parchment with a quill pen. 
This was an era when you might 
have to travel to a large city to get 
access to a table of logarithms. 

In this computationally impov- 
erished environment, one always 
wanted to get answers in what's 
known as closed form and in terms 
of elementary functions. 

A closed form expression means 
one written using a finite num- 
ber of operations, as opposed to 
something like the geometric series 
1 + x + x 2 +x 3 + . . ., which goes on 
forever. 

Elementary functions are usually 
taken to be addition, subtraction, 
multiplication, division, logs, and 
exponentials, as well as other func- 
tions derivable from these. For ex- 
ample, a cube root is allowed, since 
^/x = e( 1 ' 3 > lnx , and so are trig 
functions and their inverses, since, 
as we will see in chapter 8, they 
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can be expressed in terms of logs 
and exponentials. 

In theory, "closed form" doesn't 
mean anything unless we state the 
elementary functions that are al- 
lowed. In practice, when people 
refer to closed form, they usually 
have in mind the particular set 
of elementary functions described 
above. 

A traditional freshman calculus 
course spends such a vast amount 
of time teaching you how to do in- 
tegrals in closed form that it may 
be easy to miss the fact that this 
is impossible for the vast majority 
of integrands that you might ran- 
domly write down. Here are some 
examples of impossible integrals: 



e x dx 

x x dx 

sin a; 
da; 



.).' 



e x tan xdx 



The first of these is a form that 
is extremely important in statis- 
tics (it describes the area under the 
standard "bell curve" ) , so you can 
see that impossible integrals aren't 
just obscure things that don't pop 
up in real life. 



seem to work by a process of pat- 
tern matching. They recognize cer- 
tain integrals as being of a form 
that can't be done, so they know 
not to try. 

Example 70 

> Students! Stand at attention! 
You will now evaluate / e~" +7 *dx in 
closed form. 

> No sir, I can't do that. By a change of 
variables of the form u = x + c, where 
c is a constant, we could clearly put 
this into the form / e~* dx, which we 
know is impossible. 

Sometimes an integral such as 
J e~ x dx is important enough that 
we want to give it a name, tab- 
ulate it, and write computer sub- 
routines that can evaluate it nu- 
merically. For example, statisti- 
cians define the "error function" 
erf (a;) = (2/^/tt) J e~ x dx. Some- 
times if you're not sure whether an 
integral can be done in closed form, 
you can put it into computer soft- 
ware, which will tell you that it 
reduces to one of these functions. 
You then know that it can't be 
done in closed form. For exam- 
ple, if you ask the popular web site 
integrals.com to do J e~ x +7x dx, 
it spits back (l/2)e 49 / 4 ^i : erf(a; - 
7/2). This tells you both that 
you shouldn't be wasting your time 
trying to do the integral in closed 
form and that if you need to evalu- 
ate it numerically, you can do that 
using the erf function. 



People who are proficient at doing As shown in the following example, 
integrals in closed form generally just because an indefinite integral 
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can't be done, that doesn't mean 
that we can never do a related def- 
inite integral. 

Example 71 

> Evaluate f n/2 e" ta " 2 x (tan 2 x + 1 )dx. 

> The obvious substitution to try is u = 
tanx, and this reduces the integrand 
to e~ x . This proves that the corre- 
sponding indefinite integral is impos- 
sible to express in closed form. How- 
ever, the definite integral can be ex- 
pressed in closed form; it turns out to 
be Vn/2. The trick for proving this is 
given in example 96 on p. 130. 

Sometimes computer software 
can't say anything about a partic- 
ular integral at all. That doesn't 
mean that the integral can't be 
done. Computers are stupid, 
and they may try brute-force 
techniques that fail because the 
computer runs out of memory 
or CPU time. For example, the 
integral / dx/(a; 10000 - 1) (prob- 
lem 14, p. 124) can be done in 
closed form using the techniques 
of chapter 8, and it's not too hard 
for a proficient human to figure 
out how to attack it, but every 
computer program I've tried it on 
has failed silently. 



96 



CHAPTER 5. TECHNIQUES 



Problems 

1 Graph the function y = e x — 
7x and get an approximate idea of 
where any of its zeroes are (i.e., for 
what values of a; we have y(x) = 0). 
Use Newton's method to find the 
zeroes to three significant figures of 
precision. 



2 The relationship between x and 
y is given by xy = sin y + x 2 y 2 . 

(a) Use Newton's method to find 
the nonzero solution for y when 
X = 3. Answer: y = 0.2231 

(b) Find dy/dx in terms of x and 
y, and evaluate the derivative at 
the point on the curve you found in 
part a. Answer: dy/dx = —0.0379 
Based on an example by Craig B. 
Watkins. 

3 Suppose you want to evaluate 
dx 



1 + sin 2x 



and you've found 



da; 



1 + sin x 



tan 



in a table of integrals. Use a 
change of variable to find the an- 
swer to the original problem. 



4 Evaluate 



sin xdx 



1 



cos a: 



6 Evaluate 



Vi 4 + 6a; 2 da; 



where b is a constant. 



7 Evaluate 



/ 



da; 



8 Evaluate 



s x da; 



9 Use integration by parts to 
evaluate the following integrals. 



_1 x dx 



-1 x dx 



tan x dx 



10 Evaluate 

x sin x dx 



Hint: Use integration by parts 
more than once. 



11 Evaluate 



dx 



5 Evaluate 



sin a;da; 
1 + cos 2 x 



12 Evaluate 



dx 
^x~ 2 
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13 Evaluate 



/ 



dx 



Ax 



14 Apply integration by parts 
twice to 



: cos x dx 



examine what happens, and ma- 
nipulate the result in order to solve 
the original integral. (An approach 
that doesn't rely on tricks is given 
in example 88 on p. 121.) 

15 Plan, but do not actually 

carry out the steps that would be 
required in order to generalize the 
result of example 67 on p. 89 in or- 
der to evaluate 



da; 



where a and b are constants. 
Which is easier, the generalization 
from 2 to a, or the one from e to 
6? Do we need to introduce any re- 
strictions on a or b? 

> Solution, p. 186 

16 The integral J e~ x Ax can't 
be done in closed form. Knowing 
this, use a change of variable to 
write down a different integral that 
also can't be done in closed form. 



where p is a constant. There is an 
obvious substitution. If this is to 
result in an integral that can be 
evaluated in closed form by a se- 
ries of integrations by parts, what 
are the possible values of p? Don't 
actually complete the integral; just 
determine what values of p will 
work. > Solution, p. 186 



17 Consider the integral 



e x dx 
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6 Improper integrals 



6.1 Integrating a 
function that 
blows up 



When we integrate a function that 
blows up to infinity at some point 
in the interval we're integrating, 
the result may be either finite or 
infinite. 



Example 72 
> Integrate the function y = 1/v 7 * 
from x = to x = 1 . 




a / The integral 

J 1 dx/v 7 * is finite. 



Example 73 
> Integrate the function y = 1/x 2 from 
x = to x = 1 . 



> The function blows up to infinity at 
one end of the region of integration, 
but let's just try evaluating it, and see 
what happens. 



dx = -x" 1 



-1 + 



-1/2 



dx = 2x 



1/2 



The result turns out to be finite. In- 
tuitively, the reason for this is that the 
spike at x = is very skinny, and gets 
skinny fast as we go higher and higher 
up. 



Division by zero is undefined, so the 
result is undefined. 

Another way of putting it, using the hy- 
perreal number system, is that if we 
were to integrate from e to 1 , where e 
was an infinitesimal number, then the 
result would be -1 +1/e, which is infi- 
nite. The smaller we make e, the big- 
ger the infinite result we get out. 

Intuitively, the reason that this integral 
comes out infinite is that the spike at 
x = is fat, and doesn't get skinny 
fast enough. 
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b / The integral J 1 dx/x 2 
is infinite. 

These two examples were examples 
of improper integrals. 

6.2 Limits of 
integration at 
infinity 

Another type of improper integral 
is one in which one of the limits of 
integration is infinite. The nota- 
tion 

/>oo 

f(x) dx 

means the limit of J f(x) dx, 
where H is made to grow big- 
ger and bigger. Alternatively, we 
can think of it as an integral in 
which the top end of the interval 
of integration is an infinite hyper- 
real number. A similar interpreta- 
tion applies when the lower limit is 
— oo, or when both limits are infi- 
nite. 



I 

> Evaluate 



Example 74 



dx 



M 
+ 1 



x As H gets bigger and bigger, the re- 
sult gets closer and closer to 1 , so the 
1 result of the improper integral is 1 . 



Note that this is the same graph as 
in example 72, but with the x and y 
axes interchanged; this shows that the 
two different types of improper inte- 
grals really aren't so different. 




x~ 2 dx 



5 10 

c / The integral 

f™ dx/x 2 is finite. 



Example 75 
> Newton's law of gravity states that 
the gravitational force between two 
objects is given by F = Gmim 2 /r 2 , 
where G is a constant, m-, and m 2 
are the objects' masses, and r is 
the center-to-center distance between 
them. Compute the work that must be 
done to take an object from the earth's 
surface, at r = a, and remove it to 
r = oo. 
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Gm : m 2 dr 



Gmi m 2 I r 2 6r 



- Gm-\ m 2 r 1 
Gm^m 2 



The answer is inversely proportional 
to a. In other words, if we were able to 
start from higher up, less work would 
have to be done. 
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Problems 

1 Integrate 

e~ x dx 



or show that it diverges. 

2 Integrate 

f T • 

or show that it diverges. 

3 Integrate 
" l dx 



or show that it diverges. 
4 Integrate 

x 2 2 x dx 



or show that it diverges. 

> Solution, p. 186 



5 Integrate 



cos x dx 



or show that it diverges. (Problem 
14 on p. 97 suggests a trick for do- 
ing the indefinite integral.) 



6 Prove that 



dx 



converges, but don't evaluate it. 



(b) Find the average value of x, or 
show that it diverges. 

(c) Find the standard deviation of 
x, or show that it diverges. 

8 Prove 



7 (a) Verify that the probability 
distribution dP/dx given in exam- 
ple 57 on page 78 is properly nor- 
malized. 



7 Sequences and Series 



7.1 Infinite 
sequences 

Consider an infinite sequence of 
numbers like 1/2, 2/3, 3/4, 4/5, 
. . . We want to define this as ap- 
proaching 1, or "converging to 1." 
The way to do this is to make a 
function f(n), which is only well 
defined for integer values of n. 
Then /(l) = 1/2, /(2) = 2/3, and 
in general f{n) = n/(n + 1). With 
just a little tinkering, our defini- 
tions of limits can be applied to 
this type of function (see problem 
1 on page 112). 

7.2 Infinite series 

A related question is how to rigor- 
ously define the sum of infinitely 
many numbers, which is referred 
to as an infinite series. An exam- 
ple is the geometric series 1 + x + 
x 2 + x 3 + . . . = 1/(1 — x), which 
we used casually on page 29. The 
general concept of an infinite series 
goes back to ancient Greek math- 
ematics. Various supposed para- 
doxes about infinite series, such as 
Zeno's paradox, were exhibited, in- 
fluencing Euclid to sidestep the is- 
sue in his Elements, where in Book 
IX, Proposition 35 he provides only 
an expression (1 — x")/(l — x) for 
the nth partial sum of the geo- 
metric series. The case where n 
gets so big that x n becomes neg- 



ligible is left to the reader's imag- 
ination, as in one of those scenes 
in a romance novel that ends with 
something like "...and she surren- 
dered..." For those with modern 
training, the idea is that an infi- 
nite sum like 1 + 1 + 1 + . . . would 
clearly give an infinite result, but 
this is only because the terms are 
all staying the same size. If the 
terms get smaller and smaller, and 
get smaller fast enough, then the 
result can be finite. For example, 
consider the geometric series in the 
case where x = 1/2, for which we 
expect the result 1/(1 - 1/2) = 2. 
We have 

1111 

1+ 2 + 4 + 8 + 16 + --- ' 



which at the successive steps of ad- 
dition equals 1, 1^, l|, 1?, ly|, 
.... We're getting closer and closer 
to 2, cutting the distance in half 
at each step. Clearly we can get as 
close as we like to 2, if we're willing 
to add enough terms. 

Note that we ended up wanting to 
talk about the partial sums of the 
series. This is the right way to get 
a rigorous definition of the conver- 
gence of series in general. In the 
case of the geometric series, for ex- 
ample, we can define a sequence of 
the partial sums 1, 1 + x, 1 + x + x 2 , 
. . . We can then define convergence 
and limits of series in terms of con- 
vergence and limits of the partial 
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sums. 

It's instructive to see what hap- 
pens to the geometric series with 
x = 0.1. The geometric series be- 
comes 

1 + 0.1 +0.01 + 0.001 + ... 

The partial sums are 1, 1.1, 1.11, 
1.111, ... We can see vividly here 
that adding another term will only 
affect the result in a certain deci- 
mal place, without affecting any of 
the earlier ones. For instance, if 
we needed a result that was valid 
to three digits past the decimal 
place, we could stop at 1.111, be- 
ing assured that we had attained a 
good enough approximation. If we 
wanted an exact result, we could 
also observe that multiplying the 
result by 9 would give 9.999..., 
which is the same as 10, so the 
result must be 10/9, which is in 
agreement with 1/(1 — 1/10) = 
10/9. 

One thing to watch out for with 
infinite series is that the axioms of 
the real number system only talk 
about finite sums, so it's easy to 
get wrong results by attempting 
to apply them to infinite ones (see 
problem 2 on page 112). 

7.3 Tests for 
convergence 

There are many different tests that 
can be used to determine whether 
a sequence or series converges. I'll 
briefly state three of the most use- 
ful, with sketches of their proofs. 



Bounded and increasing sequences: 
A sequence that always increases, 
but never surpasses a certain value, 
converges. 

This amounts to a restatement of 
the compactness axiom for the real 
numbers stated on page 153, and 
is therefore to be interpreted not 
so much as a statement about se- 
quences but as one about the real 
number system. In particular, it 
fails if interpreted as a statement 
about sequences confined entirely 
to the rational number system, 
as we can see from the sequence 
1, 1.4, 1.41, 1.414, ...consisting 
of the successive decimal approx- 
imations to v2; which does not 
converge to any rational-number 
value. 

Example 76 

> Prove that the geometric series 1 + 
1/2 + 1 /4 + . . . converges. 

> The sequence of partial sums is in- 
creasing, since each term is positive. 
Each term closes half of the remain- 
ing gap separating the previous par- 
tial sum from 2, so the sum never sur- 
passes 2. Since the partial sums are 
increasing and bounded, they con- 
verge to a limit. 

Once we know that a particular se- 
ries converges, we can also easily 
infer the convergence of other se- 
ries whose terms get smaller faster. 
For example, we can be certain 
that if the geometric series con- 
verges, so does the series 



1 1 



i 



1x2x3 
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whose terms get smaller faster 
than any base raised to the power 

71. 

Alternating series with terms ap- 
proaching zero: If the terms of 
a series alternate in sign and ap- 
proach zero, then the series con- 
verges. 

Sketch of a proof: The even par- 
tial sums form an increasing se- 
quence, the odd sums a decreas- 
ing one. Neither of these sequences 
of partial sums can be unbounded, 
since the difference between partial 
sums n and n + 1 would then have 
to be unbounded, but this differ- 
ence is simply the nth term, and 
the terms approach zero. Since 
the even partial sums are increas- 
ing and bounded, they converge 
to a limit, and similarly for the 
odd ones. The two limits must 
be equal, since the terms approach 
zero. 



Example 77 

> Prove that the series 1 — 1 /2 + 1 /3 — 
1 /4 + . . . converges. 

> Its convergence follows because it is 
an alternating series with decreasing 
terms. The sum turns out to be In 2, 
although the convergence of the se- 
ries is so slow that an extremely large 
number of terms is required in order to 
obtain a decent approximation, 

The integral test: If the terms of a 
series a n are positive and decreas- 
ing, and f(x) is a positive and de- 
creasing function on the real num- 
ber line such that f(n) = a n , then 
the sum of a„ from n = 1 to oo 



converges if and only if J. f(x)dx 
does. 

Sketch of proof: Since the theo- 
rem is supposed to hold for both 
convergence and divergence, and 
is also an "if and only if," there 
are actually four cases to prove, of 
which we pick the representative 
one where the integral is known to 
converge and we want to prove con- 
vergence of the corresponding sum. 
The sum and the integral can be 
interpreted as the areas under two 
graphs: one like a smooth ramp 
and one like a staircase. Sliding the 
staircase half a unit to the left, it 
lies entirely underneath the ramp, 
and therefore the area under it is 
also finite. 

Example 78 

> Prove that the series 1 +1 /2+1 /3+. . . 
diverges. 

> The integral of 1 /x is In x, which di- 
verges as x approaches infinity, so the 
series diverges as well. 

The ratio test: If the limit R = 
linin^oo \a n+ i/a n \ exists, then the 
sum of a n converges if R < 1 and 
diverges if R > 1 . 

The proof can be obtained by com- 
paring with a geometric series. 

Example 79 

> Prove that the series 1 + 1 /2 2 + 1 /3 3 + 
. . . converges. 

> R is easily proved to be 0, so the 
sum converges by the ratio test. 

At this point it will seem like a 
mystery how anyone could have 
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proved the exact results claimed 
for some of the "special" series, 
such as 1 - 1/2 + 1/3 - 1/4 + 
... = In 2. Problems like these are 
not the main focus of the chap- 
ter, and in fact there is no well- 
defined toolbox of techniques that 
will allow any such "nice" series to 
be evaluated exactly. Even a rel- 
atively innocent-looking example 
like \- 2 + 2~ 2 + 3~ 2 + . . . defeated 
some of the best mathematicians of 
Europe for years (see problem 16, 
p. 114). It is currently unknown 
whether some apparently simple 
series such as J^^Li l/(^ 3 sin n) 
converge. 1 

7.4 Taylor series 

If you calculate e 01 on your calcu- 
lator, you'll find that it's very close 
to 1.1. This is because the tangent 
line at x = on the graph of e x 
has a slope of 1 (de^/dx = e x = 1 
at x = 0), and the tangent line is 
a good approximation to the expo- 
nential curve as long as we don't 
get too far away from the point of 
tangency. 



How big is the error? The 
actual value of e 01 is 
1.10517091807565..., which 

differs from 1.1 by about 0.005. 
If we go farther from the point 
of tangency, the approximation 
gets worse. At x = 0.2, the error 




a / The function e", and 
the tangent line at x = 0. 

is about 0.021, which is about 
four times bigger. In other words, 
doubling x seems to roughly 
quadruple the error, so the error 
is proportional to x 2 ; it seems to 
be about x 2 /2. Well, if we want 
a handy-dandy, super-accurate 
estimate of e x for small values of 
x, why not just account for this 
error. Our new and improved 
estimate is 



1 



for small values of x. 




1 Alekseyev, "On convergence of the 
Flint Hills series," axxiv.org/abs/1104. 
5100vl 



b / The function e x , and 
the approximation 1 +x + 
x 2 /2. 
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Figure b shows that the approxi- 
mation is now extremely good for 
sufficiently small values of x. The 
difference is that whereas 1 + x 
matched both the y-intercept and 
the slope of the curve, 1 + x + x 2 /2 
matches the curvature as well. Re- 
call that the second derivative is a 
measure of curvature. The second 
derivatives of the function and its 
approximation are 



order term to be (l/2)(l/3): 



da; 



1 



d (, ! 2 



We can do even better. Suppose 




c/The function e", and 
the approximation 1 +x + 
x 2 /2 + x 3 /6. 

we want to match the third deriva- 
tives. All the derivatives of e x , 
evaluated at X = 0, are 1, so we 
just need to add on a term pro- 
portional to x 3 whose third deriva- 
tive is one. Taking the first deriva- 
tive will bring down a factor of 3 
in front, and taking and the sec- 
ond derivative will give a 2, so to 
cancel these out we need the third- 



1 



1 2 
■X+-X 



2-3 



Figure c shows the result. For a 
significant range of x values close 
to zero, the approximation is now 
so good that we can't even see the 
difference between the two func- 
tions on the graph. 

On the other hand, figure d shows 
that the cubic approximation for 
somewhat larger negative and pos- 
itive values of x is poor — worse, 
in fact, than the linear approxi- 
mation, or even the constant ap- 
proximation e x = 1. This is to 
be expected, because any polyno- 
mial will blow up to either posi- 
tive or negative infinity as x ap- 
proaches negative infinity, whereas 
the function e x is supposed to get 
very close to zero for large negative 
x. The idea here is that derivatives 
are local things: they only measure 
the properties of a function very 
close to the point at which they're 
evaluated, and they don't necessar- 
ily tell us anything about points far 
away. 

It's a remarkable fact, then, that 
by taking enough terms in a poly- 
nomial approximation, we can al- 
ways get as good an approximation 
to e x as necessary — it's just that 
a large number of terms may be 
required for large values of a;. In 
other words, the infinite series 



l + x- 



1 



1 

2^3 



x 3 + 
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d / The function e x , and the approxi- 
mation 1 + x + x 2 /2 + x 3 /6, on a wider 
scale. 

always gives exactly e x . But what 
is the pattern here that would al- 
lows us to figure out, say, the 
fourth-order and fifth-order terms 
that were swept under the rug 
with the symbol "..."? Let's do 
the fifth-order term as an example. 
The point of adding in a fifth-order 
term is to make the fifth derivative 
of the approximation equal to the 
fifth derivative of e x , which is 1. 
The first, second, . . . derivatives of 
x 5 are 



The notation for a product like 1 • 
2 • . . . • n is n\, read u n factorial." 
So to get a term for our polynomial 
whose fifth derivative is 1, we need 
a; 5 /5!. The result for the infinite 



n=0 

where the special case of 0! = 1 
is assumed. 2 This infinite series 
is called the Taylor series for e x , 
evaluated around x = 0, and it's 
true, although I haven't proved it, 
that this particular Taylor series 
always converges to e x , no matter 
how far x is from zero. 

In general, the Taylor series 
around x = for a function y is 



da; 



-x 5 = 5a; 4 



Ax 2 



da; 3 



da; 4 



da; 5 



x 5 = 5 • 4a; 3 



x 5 = 5 • 4 • 3a; 2 



x" = 5 • 4 • 3 • 2x 



x" = 5 • 4 ■ 3 • 2 • 1 



T (x)=J2' 



71=0 

where the condition for equality of 
the nth order derivative is 

n! da;™ x=0 

Here the notation | means that 
the derivative is to be evaluated at 
x = 0. 

A Taylor series can be used to ap- 
proximate other functions besides 
e x , and when you ask your calcula- 
tor to evaluate a function such as a 
sine or a cosine, it may actually be 
using a Taylor series to do it. Tay- 
lor series are also the method Inf 



2 This makes sense, because, for exam- 
ple, 4!=5!/5, 3!=4!/4, etc., so we should 
have 0!=1!/1. 
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uses to calculate most expressions 
involving infinitesimals. In exam- 
ple 13 on page 29, we saw that 
when Inf was asked to calculate 
1/(1 — d), where d was infinitesi- 
mal, the result was the geometric 
series: 

: 1/U-d) 

l+d+d~2+d~3+d~4 

These are also the the first five 
terms of the Taylor series for the 
function y = 1/(1 — x), evaluated 
around x = 0. That is, the geo- 
metric series 1+x + x 2 +x 3 + . . . is 
really just one special example of 
a Taylor series, as demonstrated in 
the following example. 



Example 80 

> Find the Taylor series of y = 1/(1 - 
x) around x = 0. 

> Rewriting the function as y = (1 - 
x)~ 1 and applying the chain rule, we 
have 

yL=o = 1 



dy 
dx 


= (1-xH =1 

x=0 ]x =° 


d 2 y 
dx 2 


= 2(1-x)" 3 = 2 

x=0 ,x =° 


d 3 y 
dx3 


= 2-3(1 -x)- 4 =2-3 

x=0 lx=0 



The pattern is that the nth derivative 
is n\. The Taylor series therefore has 
a„ = n\/n\ = 1: 



1 



1 + x + x 



If you flip back to page 104 and 
compare the rate of convergence of 
the geometric series for x = 0.1 
and 0.5, you'll see that the sum 
converged much more quickly for 
x = 0.1 than for x = 0.5. In 
general, we expect that any Taylor 
series will converge more quickly 
when x is smaller. Now consider 
what happens at x = 1. The series 
is now 1 + 1 + 1 + . . ., which gives 
an infinite result, and we shouldn't 
have expected any better behav- 
ior, since attempting to evaluate 
1/(1 — x) at x = 1 gives divi- 
sion by zero. For x > 1, the re- 
sults become nonsense. For exam- 
ple, 1/(1 - 2) = -1, which is fi- 
nite, but the geometric series gives 
1 + 2 + 4 + . . ., which is infinite. 

In general, every function's Taylor 
series around x = converges for 
all values of x in the range defined 
by | a; | < r, where r is some num- 
ber, known as the radius of con- 
vergence. Also, if the function is 
defined by putting together other 
functions that are well behaved (in 
the sense of converging to their 
own Taylor series in the relevant 
region), then the Taylor series will 
not only converge but converge to 
the correct value. For the function 
e x , the radius happen to be infi- 
nite, whereas for 1/(1 — a;) it equals 
1. The following example shows a 
worst-case scenario. 



I 

The function y = e 



Example 81 
" 1/x , shown in fig- 
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> The first few derivatives are 



e/The function e 1/x never con- 
verges to its Taylor series. 



ure e, never converges to its Taylor se- 
ries, except at x = 0. This is because 
the Taylor series for this function, eval- 
uated around x = is exactly zero! At 
x = 0, we have y = 0, dy/dx = 0, 
d 2 y/dx 2 = 0, and so on for every 
derivative. The zero function matches 
the function y(x) and all its derivatives 
to all orders, and yet is useless as 
an approximation to y(x). The radius 
of convergence of the Taylor series is 
infinite, but it doesn't give correct re- 
sults except at x = 0. The reason 
for this is that y was built by compos- 
ing two functions, w(x) = -1/x 2 and 
y(w) = e w . The function w is badly 
behaved at x = because it blows up 
there. In particular, it doesn't have a 
well-defined Taylor series at x = 0. 



Example 82 
> Find the Taylor series of y = sinx, 
evaluated around x = 0. 



— smx = 
dx 


cosx 


d 2 . 

— ^smx = 
dx 2 


— sinx 


d 3 . 

— - 1 -smx = 
dx 3 


-cosx 


d 4 . 

-— smx = 
dx 4 


sinx 


d 5 
-r-r smx = 


cosx 



dx 

We can see that there will be a cy- 
cle of sin, cos, -sin, and -cos, re- 
peating indefinitely. Evaluating these 
derivatives at x = 0, we have 0, 1, 0, 

-1, All the even-order terms of 

the series are zero, and all the odd- 
order terms are ±1 /n\. The result is 



smx 



1 3 1 5 

V X + 5!* 



The linear term is the familiar small- 
angle approximation sin x ^ x. 

The radius of convergence of this se- 
ries turns out to be infinite. Intuitively 
the reason for this is that the factori- 
als grow extremely rapidly, so that the 
successive terms in the series even- 
tually start diminish quickly, even for 
large values of x. 

Example 83 
Suppose that we want to evaluate a 
limit of the form 



lim 



u(x) 



o v(x) 

where u(0) = v(0) = 0. L'Hopital's rule 
tells us that we can do this by taking 
derivatives on the top and bottom to 
form u'/v', and that, if necessary, we 
can do more than one derivative, e.g., 



7.4. TAYLOR SERIES 



111 



u" /v". This was proved on p. 148 us- 
ing the mean value theorem. But if u 
and v are both functions that converge 
to their Taylor series, then it is much 
easier to see why this works. For ex- 
ample, suppose that their Taylor se- 
ries both have vanishing constant and 
linear terms, so that u = ax 2 + . . . and 
v = bx 2 + . . .. Then u" = 2a + . . ., and 
v" = 2b+.... 

A function's Taylor series doesn't 
have to be evaluated around x = 
0. The Taylor series around some 
other center x = c is given by 



T c {x) 



n=0 



i{x - c) r 



Note that evaluating these at x = 
wouldn't have worked, since division 
by zero is undefined; this is because 
Inx blows up to negative infinity at 
x = 0. Evaluating them at x = 1, 
we find that the nth derivative equals 
±(n - 1)1, so the coefficients of the 
Taylor series are ±(n- 1)!/n! = ±1/n, 
except for the n = term, which is 
zero because In 1 =0. The resulting 
series is 



lnx = (x-1) 



1 



-d 2 4 



-D 3 +. 



We can predict that its radius of con- 
vergence can't be any greater than 1 , 
because In x blows up at 0, which is at 
a distance of 1 from 1 . 



where 






cry 

dx n 



To see that this is the right gen- 
eralization, we can do a change of 
variable, defining a new function 
g(x) = f(x — c). The radius of con- 
vergence is to be measured from 
the center c rather than from 0. 



Example 84 

> Find the Taylor series of Inx, evalu- 
ated around x = 1 . 

> Evaluating a few derivatives, we get 

d 



dx 
tf_ 
dx 2 

dx 3 
df_ 

dx 4 



Inx = x~ 
Inx = — x 
Inx = 2x" 



Inx 



-6x" 
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Problems 

1 Modify the Weierstrass defini- 
tion of the limit to apply to infinite 
sequences. > Solution, p. 186 

2 (a) Prove that the infinite se- 
ries 1-1 + 1-1 + 1-1 + .. . 
does not converge to any limit, us- 
ing the generalization of the Weier- 
strass limit found in problem 1. 
(b) Criticize the following argu- 
ment. The series given in part a 
equals zero, because addition is as- 
sociative, so we can rewrite it as 
(1-1) + (1-1) + (1-1) + ... 

> Solution, p. 186 

3 Use the integral test to prove 
the convergence of the geometric 
series for < x < 1. 

> Solution, p. 187 

4 Determine the convergence or 
divergence of the following series. 

(a) 1 + 1/2 2 + 1/3 2 + . . . 

(b) 1/ In In 3-1/ In In 6+1/ In In 9- 
l/lnlnl2 + ... 

(c) 



1 



1 
ln~2 + (In 2) (In 3) 

1 

+ 



(In 2) (In 3) (In 4) 



(d) 



2\/2 ^2, (4fc)!(1103 + 26390fc) 



9801 



fe=0 



(fc!) 4 396 4fe 



> Solution, p. 187 



5 Give an example of a series for 
which the ratio test is inconclusive. 
> Solution, p. 187 



6 Find the Taylor series expan- 
sion of cosx around x = 0. Check 
your work by combining the first 
two terms of this series with the 
first term of the sine function from 
example 82 on page 110 to ver- 
ify that the trig identity sin x + 
cos 2 x = 1 holds for terms up to 
order x 1 . 

7 In classical physics, the kinetic 
energy K of an object of mass m 
moving at velocity v is given by 
K = ^mv 2 . For example, if a car is 
to start from a stoplight and then 
accelerate up to v, this is the the- 
oretical minimum amount of en- 
ergy that would have to be used 
up by burning gasoline. (In real- 
ity, a car's engine is not 100% effi- 
cient, so the amount of gas burned 
is greater.) 

Einstein's theory of relativity 
states that the correct equation is 
actually 



K 




1 mc 



where c is the speed of light. The 
fact that it diverges as v — > c is 
interpreted to mean that no object 
can be accelerated to the speed of 
light. 

Expand if in a Taylor series, and 
show that the first nonvanishing 
term is equal to the classical ex- 
pression. This means that for ve- 
locities that are small compared to 
the speed of light, the classical ex- 
pression is a good approximation, 
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and Einstein's theory does not con- 
tradict any of the prior empirical 
evidence from which the classical 
expression was inferred. 

8 Expand (1 + x) 1 ' 3 in a Taylor 
series around x = 0. The value 
x = 28 lies outside this series' ra- 
dius of convergence, but we can 
nevertheless use it to extract the 
cube root of 28 by recognizing that 
28 1 / 3 = 3(28/27) 1 / 3 . Calculate the 
root to four significant figures of 
precision, and check it in the ob- 
vious way. 

9 Find the Taylor series expan- 
sion of log 2 x around x = 1, and 
use it to evaluate log 2 1.0595 to 
four significant figures of precision. 
Check your result by using the fact 
that 1.0595 is approximately the 
twelfth root of 2. This number is 
the ratio of the frequencies of two 
successive notes of the chromatic 
scale in music, e.g., C and D-flat. 



10 In free fall, the acceleration 
will not be exactly constant, due 
to air resistance. For example, a 
skydiver does not speed up indefi- 
nitely until opening her chute, but 
rather approaches a certain maxi- 
mum velocity at which the upward 
force of air resistance cancels out 
the force of gravity. If an object is 
dropped from a height h, and the 
time it takes to reach the ground is 
used to measure the acceleration of 
gravity, g, then the relative error in 



the result due to air resistance is 3 

9 9vacuum 



E 



9 



1 



26 



In 2 (e b + Ve 2b - l\ 



where b = h/A, and A is a constant 
that depends on the size, shape, 
and mass of the object, and the 
density of the air. (For a sphere of 
mass m and diameter d dropping 
in air, A = AAlm/d 2 . Cf. problem 
18, p. 48.) Evaluate the constant 
and linear terms of the Taylor se- 
ries for the function E(b). 

11 (a) Prove that the conver- 
gence of an infinite series is un- 
affected by omitting some initial 
terms, (b) Similarly, prove that 
convergence is unaffected by mul- 
tiplying all the terms by some con- 
stant factor. 



12 The identity 



Jo 



c dx 



£•■ 



is known as the "Sophomore's 
dream," because at first glance it 
looks like the kind of plausible 
but false statement that someone 
would naively dream up. Verify it 
numerically by machine computa- 
tion. 

13 Does sin a: + sin sin a; + 

sin sin sin x + . . . converge? 

> Solution, p. 188 * 



3 Jan Benacka and Igor Stubna, The 
Physics Teacher, 43 (2005) 432. 
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14 Evaluate 



1 + 



1 



+ 



1 



1+2 1+2+3 

> Solution, p. 188 * 
15 Evaluate 

(-1)" 



E 



n + 1 + 1/n! 



n=0 

to six decimal places. ^ 

16 Euler was the first to prove 



L 1 1 


IT 


2 + 2 2 + 3 2 4 


6 



This problem had defeated other 
great mathematicians of his time, 
and was famous enough to be given 
a special name, the Basel prob- 
lem. Here we present an argument 
based closely on Euler's and pose 
the problem of how to exploit Eu- 
ler's technique further in order to 
prove 
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I 1 + F + 34 



7T 
90 



From the Taylor series for the sine 
function, we find the related series 



/(*) 



1 



x 
3! 



x 



The partial sums of this series are 
polynomials that approximate / 
for small values of j. If such a 
polynomial were exact rather than 
approximate, then it would have 
zeroes at x = 7r 2 , 4ir 2 , 9ir 2 , ..., 
and we could write it as the prod- 
uct of its linear factors. Euler as- 
sumed, without any more rigorous 



proof, that this factorization pro- 
cedure could be extended to the 
infinite series, so that / could be 
represented as the infinite product 

/M = (i-J)(i-^)- 

By multiplying this out and equat- 
ing its linear term to that of the 
Taylor series, we find the claimed 
result. 

Extend this procedure to the x 1 
term and prove the result claimed 
for the sum of the inverse fourth 
powers of the integers. (The 
sums with odd exponents > 3 are 
much harder, and relatively little 
is known about them. The sum 
of the inverse cubes is known as 
Apery's constant.) * 



17 Does 



sin(a; ) dec 



converge, or not? 

> Solution, p. 188 * 
18 Evaluate 



lim cos(ir\/n 2 



where n is an integer. * 

19 Determine the convergence 
of the series 



5> 2 2" 



ra=0 



and if it converges, evaluate it. 

> Solution, p. 189 * 
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20 Determine the convergence 
of the series 

DC 

and if it converges, evaluate it. 

> Solution, p. 189 * 

21 For what integer values of p 
should we expect the series 



E 



cosn| 
nP 



to converge? A rigorous proof is 
very difficult and may even be an 
open problem, but it is relatively 
straightforward to give a convinc- 
ing argument. 

> Solution, p. 189 * 
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8 Complex number 
techniques 



8.1 Review of 
complex 
numbers 

For a more detailed treatment of 
complex numbers, see ch. 3 of 
James Nearing's free book at 
http: //www. physics .miami . edu/ 
nearing/mathmethods/. 



l/l 




X 




ID 




> 




i— 




rn 




c 


i 2 + i 


"oi 


• 


<T3 




£ 




-1 ' 


1 




real axis 


■ 


i i 




-1 



a / Visualizing complex numbers as 
points in a plane. 



We assume there is a number, i, 
such that i 2 = — 1. The square 
roots of —1 are then i and —i. (In 
electrical engineering work, where 
i stands for current, j is sometimes 
used instead.) This gives rise to 
a number system, called the com- 
plex numbers, containing the real 



3+4i 




b / Addition of complex numbers is 
just like addition of vectors, although 
the real and imaginary axes don't ac- 
tually represent directions in space. 

numbers as a subset. Any com- 
plex number z can be written in 
the form z = a + bi, where a and 
b are real, and a and b are then 
referred to as the real and imagi- 
nary parts of z. A number with 
a zero real part is called an imag- 
inary number. The complex num- 
bers can be visualized as a plane, 
figure a, with the real number line 
placed horizontally like the x axis 
of the familiar x — y plane, and the 
imaginary numbers running along 
the y axis. The complex num- 
bers are complete in a way that the 
real numbers aren't: every nonzero 
complex number has two square 
roots. For example, 1 is a real 
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2 + i 



2-i 



c / A complex number and its conju- 
gate. 

number, so it is also a member 
of the complex numbers, and its 
square roots are —1 and 1. Like- 
wise, — 1 has square roots i and —i, 
and the number i has square roots 
1/72-M/72 and -l/V2-i/y/2. 

Complex numbers can be added 
and subtracted by adding or sub- 
tracting their real and imaginary 
parts, figure b. Geometrically, this 
is the same as vector addition. 

The complex numbers a + bi and 
a — bi, lying at equal distances 
above and below the real axis, are 
called complex conjugates. The re- 
sults of the quadratic formula are 
either both real, or complex conju- 
gates of each other. The complex 
conjugate of a number z is notated 
as z or z* . 

The complex numbers obey all the 
same rules of arithmetic as the re- 
als, except that they can't be or- 
dered along a single line. That is, 



it's not possible to say whether one 
complex number is greater than 
another. We can compare them 
in terms of their magnitudes (their 
distances from the origin), but 
two distinct complex numbers may 
have the same magnitude, so, for 
example, we can't say whether 1 is 
greater than i or i is greater than 
1. 

Example 85 

> Prove that 1 /V2 + i/\/2 is a square 
root of /'. 

> Our proof can use any ordinary rules 
of arithmetic, except for ordering. 



1 

71 


^ 


1 

"71 


1 
7! + 


1 

71 


71 






+ 7! 
1 ,. 


7! + 


/ 

71 


/ 

71 



(1 +; + /- 1) 



Example 85 showed one method 
of multiplying complex numbers. 
However, there is another nice in- 
terpretation of complex multiplica- 
tion. We define the argument of 
a complex number, figure d, as its 
angle in the complex plane, mea- 
sured counterclockwise from the 
positive real axis. Multiplying 
two complex numbers then corre- 
sponds to multiplying their magni- 
tudes, and adding their arguments, 
figure e. 

Self-Check 

Using this interpretation of multiplica- 
tion, how could you find the square 
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d / A complex number can be de- 
scribed in terms of its magnitude and 
argument. 

roots of a complex number? > 

Answer, p. 161 



Example 86 
The magnitude \z\ of a complex num- 
ber z obeys the identity \z\ 2 = zz. 
To prove this, we first note that z has 
the same magnitude as z, since flip- 
ping it to the other side of the real axis 
doesn't change its distance from the 
origin. Multiplying z by z gives a re- 
sult whose magnitude is found by mul- 
tiplying their magnitudes, so the mag- 
nitude of zz must therefore equal |z| 2 . 
Now we just have to prove that zz is a 
positive real number. But if, for exam- 
ple, z lies counterclockwise from the 
real axis, then z lies clockwise from 
it. If z has a positive argument, then 
z has a negative one, or vice-versa. 
The sum of their arguments is there- 
fore zero, so the result has an argu- 
ment of zero, and is on the positive 
real axis. x 



e / The argument of uv is the sum of 
the arguments of u and v. 

This whole system was built up 
in order to make every number 
have square roots. What about 
cube roots, fourth roots, and so 
on? Does it get even more weird 
when you want to do those as well? 
No. The complex number system 
we've already discussed is sufficient 
to handle all of them. The nicest 
way of thinking about it is in terms 
of roots of polynomials. In the 
real number system, the polyno- 
mial x 2 — 1 has two roots, i.e., two 
values of x (plus and minus one) 
that we can plug in to the polyno- 
mial and get zero. Because it has 
these two real roots, we can rewrite 
the polynomial as (x — l)(x + 1). 
However, the polynomial x 2 + l has 
no real roots. It's ugly that in the 
real number system, some second- 



1 l cheated a little. If z's argument is 



30 degrees, then we could say z's was -30, 
but we could also call it 330. That's OK, 
because 330+30 gives 360, and an argu- 
ment of 360 is the same as an argument 
of zero. 
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order polynomials have two roots, 
and can be factored, while others 
can't. In the complex number sys- 
tem, they all can. For instance, 



+ 1 has roots i and 



and can 



be factored as (x — i)(x + i). In 
general, the fundamental theorem 
of algebra states that in the com- 
plex number system, any nth-order 
polynomial can be factored com- 
pletely into n linear factors, and 
we can also say that it has n com- 
plex roots, with the understand- 
ing that some of the roots may be 
the same. For instance, the fourth- 
order polynomial x A + x 2 can be 
factored as (x — i) [x + i) (x — 0) (x — 
0), and we say that it has four 
roots, i, —i, 0, and 0, two of which 
happen to be the same. This is a 
sensible way to think about it, be- 
cause in real life, numbers are al- 
ways approximations anyway, and 
if we make tiny, random changes to 
the coefficients of this polynomial, 
it will have four distinct roots, of 
which two just happen to be very 
close to zero. I've given a proof of 
the fundamental theorem of alge- 
bra on page 158. 

8.2 Euler's formula 

Having expanded our horizons to 
include the complex numbers, it's 
natural to want to extend func- 
tions we knew and loved from the 
world of real numbers so that they 
can also operate on complex num- 
bers. The only really natural way 
to do this in general is to use Tay- 
lor series. A particularly beautiful 



thing happens with the functions 
e x , sinx, and cos a:: 



e x = 1 + ^s 2 + -x 3 - 

1 o 1 4 

cos a; = 1 -x -\ — -x - 

2! 4! 

1 3 1 5 

SIM = X -X H :X 



3! 



5! 



If x = i<j> is an imaginary number, 
we have 



J4> 



cos d> + i sin < 



a result known as Euler's formula. 
The geometrical interpretation in 
the complex plane is shown in fig- 
ure f. 




f / The complex number e'* lies on the 
unit circle. 



Although the result may seem like 
something out of a freak show at 
first, applying the definition 2 of the 



2 See page 147 for an explanation of 
where this definition comes from and why 
it makes sense. 
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exponential function makes it clear 
how natural it is: 



lim ( 1 + - 



When x = i<j> is imaginary, the 
quantity (1 + i<f>/n) represents a 
number lying just above 1 in the 
complex plane. For large n, (1 + 
i(f>/n) becomes very close to the 
unit circle, and its argument is the 
small angle 4>/n. Raising this num- 
ber to the nth power multiplies its 
argument by n, giving a number 
with an argument of <j>. 




g / Leonhard 
(1707-1783) 



Euler 



Euler's formula is used frequently 
in physics and engineering. 



Example 87 

> Write the sine and cosine functions 
in terms of exponentials. 

> Euler's formula for x = —i<$> gives 
cos ctj- /sinttj, since cos(-6) = cos 6, 
and sin(-0) = -sine. 



cosx : 
sinx : 



e" + e" 

2 
e" - e~ 

2/ 



Example 88 



> Evaluate 



e" cos xdx 



> Problem 14 on p. 97 suggested a 
special-purpose trick for doing this in- 
tegral. An approach that doesn't rely 
on tricks is to rewrite the cosine in 
terms of exponentials: 



e" cos xdx 



= J e 

_ ^ 

" 2 



e" + e 



dx 



(e (1 + /)x + e (1-/)* )dx 



+ C 



1 + / 1 - / 



Since this result is the integral of a 
real-valued function, we'd like it to be 
real, and in fact it is, since the first and 
second terms are complex conjugates 
of one another. If we wanted to, we 
could use Euler's theorem to convert 
it back to a manifestly real result. 3 



3 In general, the use of complex num- 
ber techniques to do an integral could re- 
sult in a complex number, but that com- 
plex number would be a constant, which 
could be subsumed within the usual con- 
stant of integration. 
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Euler found the equation 

1 1 1 3 

7t = 20tan - +8tan — 
7 79 

which allowed the computation of n to 
high precision in the era before elec- 
tronic calculators, since the Taylor se- 
ries for the inverse tangent converges 
rapidly for small inputs. A cute way of 
proving the validity of the equation is 
to calculate 

(7 + /) 20 (79 + 3/) 8 

as follows in Yacas: 

(7+1) ~20* (79+3*1) "8; 
-1490116119384765625 
00000000000000 

The fact that it is purely real, and has 
a negative real part, demonstrates 
that the quantity on the right side of 
the original equation equals n + 2nn, 
where n is an integer. Numerical esti- 
mation shows that n = 0. Although the 
proof was straightforward, it provides 
zero insight into how Euler figured it 
out in the first place! 

8.3 Partial fractions 
revisited 

Suppose we want to evaluate the 
integral 

dx 



Example 89 gives A = i/2 and B = —i/2, so 



x 2 + l 



by the method of partial fractions. 
The quadratic formula tells us that 
the roots are % and —i, setting 
l/(x 2 + 1) = A/(x + i) + B/(x - i) 



dx 



dx 



x 2 + 1 2 J x + i 
dx 



2 ./ x — i 

i 

2 
i 

2 

i x + i 

- In 

2 x — i 



ln(x + i) 
ln(x — i) 



The attractive thing about this ap- 
proach, compared with the method 
used on page 86, is that it doesn't 
require any tricks. If you came 
across this integral ten years from 
now, you could pull out your old 
calculus book, flip through it, and 
say, "Oh, here we go, there's a way 
to integrate one over a polynomial 
— partial fractions." On the other 
hand, it's odd that we started out 
trying to evaluate an integral that 
had nothing but real numbers, and 
came out with an answer that isn't 
even obviously a real number. 

But what about that expression 
(x+i)/(x—i)? Let's give it a name, 
w. The numerator and denomina- 
tor are complex conjugates of one 
another. Since they have the same 
magnitude, we must have \w\ = 1, 
i.e., w is a complex number that 
lies on the unit circle, the kind of 
complex number that Euler's for- 
mula refers to. The numerator 
has an argument of tan _1 (l/x) = 
7r/2 — tan^x, and the denomi- 
nator has the same argument but 
with the opposite sign. Division 
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means subtracting arguments, so 
arg w = it — 2 tan -1 x. That means 
that the result can be rewritten us- 
ing Euler's formula as 



x 2 + 1 2 

i 



i(it — 2 tan x) 

2 v ; 



tan x + c 



In other words, it's the same result 
we found before, but found with- 
out the need for trickery. 

Example 90 

> Evaluate / dx/sinx. 

> This can be tackled by rewriting the 
sine function in terms of complex ex- 
ponentials, changing variables to u = 
e' x , and then using partial fractions. 

dx „. f dx 



sinx 


" J e' x - e-' x 




2/ [ du/ ! u 




J U-1/U 


= 






f du f du 




J U-1 J U+1 


= 


ln(u- 1) - ln(u + 1) + c 


= 


, e ix -A 

In —. + c 

e' x + 1 


= 


ln(-/tan(x/2)) + c 


= 


lntan(x/2) + c' 
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Problems 

1 Find arg i, arg(— i), and arg37, 
where arg z denotes the argument 
of the complex number z. 

2 Visualize the following multi- 
plications in the complex plane 
using the interpretation of mul- 
tiplication in terms of multiply- 
ing magnitudes and adding argu- 
ments: (i)(i) = — 1, (i)(—i) = 1, 

H)H) = -i. 

3 If we visualize z as a point in 
the complex plane, how should we 
visualize — zl 



> Solution, p. 190 

10 Find every complex number 
z such that z = 1. 

> Solution, p. 190 

11 Factor the expression x 3, — y 3, 
into factors of the lowest possible 
order, using complex coefficients. 
(Hint: use the result of problem 
10.) Then do the same using real 
coefficients. > Solution, p. 190 



12 Evaluate 



d.r 



x 2 + Ax - A 



4 Find four different complex 13 Evaluate 
numbers z such that z A = 1. 

-ax £Qg foj* ^j* 

5 Compute the following: 



|l + i| , arg(l + i) 



1+t 



, arg 



1 + i 



1 + i 



6 Write the function tana; in 
terms of complex exponentials. 

7 Evaluate J sin x dx. 

8 Use Euler's theorem to derive 
the addition theorems that express 
sin(a + b) and cos(a + b) in terms 
of the sines and cosines of a and b. 

> Solution, p. 190 

9 Evaluate 



7r/2 



cos x cos 2x dx 



14 (a) Discuss how the integral 
dx 



,10000 



1 



could be evaluated, in principle, in 
closed form, (b) See what happens 
when you try to evaluate it using 
computer software, (c) Express it 
as a finite sum. 

> Solution, p. 191 * 



9 Iterated integrals 



9.1 Integrals inside 
integrals 

In various applications, you need 
to do integrals stuck inside other 
integrals. These are known as it- 
erated integrals, or double inte- 
grals, triple integrals, etc. Simi- 
lar concepts crop up all the time 
even when you're not doing cal- 
culus, so let's start by imagining 
such an example. Suppose you 
want to count how many squares 
there are on a chess board, and you 
don't know how to multiply eight 
times eight. You could start from 
the upper left, count eight squares 
across, then continue with the sec- 
ond row, and so on, until you 
how counted every square, giving 
the result of 64. In slightly more 
formal mathematical language, we 
could write the following recipe: 
for each row, r, from 1 to 8, con- 
sider the columns, c, from 1 to 8, 
and add one to the count for each 
one of them. Using the sigma no- 
tation, this becomes 



££i 



If you're familiar with computer 
programming, then you can think 
of this as a sum that could be 
calculated using a loop nested in- 
side another loop. To evaluate the 
result (again, assuming we don't 



know how to multiply, so we have 
to use brute force), we can first 
evaluate the inside sum, which 
equals 8, giving 



£8 • 

r=l 

Notice how the "dummy" variable 
c has disappeared. Finally we do 
the outside sum, over r, and find 
the result of 64. 



Now imagine doing the same thing 
with the pixels on a TV screen. 
The electron beam sweeps across 
the screen, painting the pixels in 
each row, one at a time. This is re- 
ally no different than the example 
of the chess board, but because the 
pixels are so small, you normally 
think of the image on a TV screen 
as continuous rather than discrete. 
This is the idea of an integral in 
calculus. Suppose we want to find 
the area of a rectangle of width a 
and height b, and we don't know 
that we can just multiply to get 
the area ab. The brute force way 
to do this is to break up the rect- 
angle into a grid of infinitesimally 
small squares, each having width 
dec and height dy, and therefore the 
infinitesimal area dA = dxdy. For 
convenience, we'll imagine that the 
rectangle's lower left corner is at 
the origin. Then the area is given 
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by this integral: 



y=o 
b 



dA 



x=0 



dx dy 

j/=0 J x=0 

Notice how the leftmost integral 
sign, over y, and the rightmost 
differential, dy, act like bookends, 
or the pieces of bread on a sand- 
wich. Inside them, we have the in- 
tegral sign that runs over x, and 
the differential da; that matches it 
on the right. Finally, on the inner- 
most layer, we'd normally have the 
thing we're integrating, but here's 
it's 1, so I've omitted it. Writ- 
ing the lower limits of the integrals 
with x = and y = helps to keep 
it straight which integral goes with 
with differential. The result is 



let its legs run from the origin to (0, a), 
and then to (a, a). In other words, the 
triangle sits on top of its hypotenuse. 
Then the integral can be set up the 
same way as the one before, but for a 
particular value of y, values of x only 
run from (on the y axis) to y (on the 
hypotenuse). We then have 



area = / dA 

J y=0 J x=0 
r a ry 

I dxdy 

'y=0 JfcO 

' 3 ( I" dx) dy 

'y=0 \J x=0 / 



ydy 



y=0 

2 



y=o 
b 

y=o 
b 

y=o 
b 

>J=0 
b 



dA 



dx dy 



x=0 



dx dy 



ady 



dy 



y=o 



ab 



Note that in this example, because the 
upper end of the x values depends 
on the value of y, it makes a differ- 
ence which order we do the integrals 
in. The x integral has to be on the in- 
side, and we have to do it first. 



Volume of a cube Example 92 

> Find the volume of a cube with sides 
of length a. 



Area of a triangle Example 91 

t> Find the area of a 45-45-90 right tri- 
angle having legs a. 

> Let the triangle's hypotenuse run 
from the origin to the point (a, a), and 



> This is a three-dimensional example, 
so we'll have integrals nested three 
deep, and the thing we're integrating 
is the volume dV = dxdydz. 



9.2. APPLICATIONS 



127 



volume = / / dV 

Jz=0 J y=0 JxmO 
ra ra 

dxdy dz 

I z=0 J y=0 J x=0 

(■a ^a 



/ / a dy tiz 

J z=0 ^=0 

a / / dy dz 

Jz=0 J y=0 



a adz 

Jz=0 

a 2 f dz 

„3 



The definite integral equals n, as you 
can find using a trig substitution or 
simply by looking it up in a table, and 
the result is, as expected, nR 2 /2 for 
the area of the semicircle. Doubling it, 
we find the expected result of nR 2 for 
a full circle. 

9.2 Applications 

Up until now, the integrand of the 
innermost integral has always been 
1, so we really could have done all 
the double integrals as single inte- 
grals. The following example is one 
in which you really need to do it- 
erated integrals. 



Area of a circle Example 93 

> Find the area of a circle. 

> To make it easy, let's find the area 
of a semicircle and then double it. Let 
the circle's radius be r, and let it be 
centered on the origin and bounded 
below by the x axis. Then the curved 
edge is given by the equation R 2 = 
x 2 + y 2 , or y = VR 2 - x z . Since the 
y integral's limit depends on x, the x 
integral has to be on the outside. The 
area is 



area = / / dy dx 

J x=-R J y=0 

= [ VR 2 - x 2 dx 
= rf v/1 -(x/R) 2 dx 

J x=-R 

Substituting u = x/R, 
area = R 2 [ yj"\ - u 2 du 

J U.-1 




a / The famous tightrope 
walker Charles Blondin 
uses a long pole for its 
large moment of inertia. 



Moments of inertia Example 94 

The moment of inertia is a measure 

of how difficult it is to start an ob- 
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ject rotating (or stop it). For example, 
tightrope walkers carry long poles be- 
cause they want something with a big 
moment of inertia. The moment of in- 
ertia is defined by / = / R 2 dm, where 
dm is the mass of an infinitesimally 
small portion of the object, and R is 
the distance from the axis of rotation. 

To start with, let's do an example that 
doesn't require iterated integrals. Let's 
calculate the moment of inertia of a 
thin rod of mass M and length L about 
a line perpendicular to the rod and 
passing through its center. 



problem. The integrand of the remain- 
ing double integral breaks down into 
two terms, each of which depends on 
only one of the variables, so we break 
it into two integrals, 



rb/2 rb/2 

l=pb / y 2 dy dz 

Jb/2 Jb/2 

rb/2 rb/2 

+ pb / z 6y 6z 

Jb/2 Jb/2 



1= I R'6m 
^ 2 2 M 



L/2 
■2 J2 



x~ — dx 



[r = \x\,soR d =x'] 



-> L " 



which we know have identical results. 
We therefore only need to evaluate 
one of them and double the result: 



Now let's do one that requires iter- 
ated integrals: the moment of inertia 
of a cube of side b, for rotation about 
an axis that passes through its center 
and is parallel to four of its faces. 

Let the origin be at the center of the 
cube, and let x be the rotation axis. 



/: 



R 2 dm 



p I R'dV 

r b/2 r b/2 r b/2 



b/2 Jb/2 Jb/2 
b/2 r b/2 



y 2 +z 2 ) dxdydz 



= pbl I (y 2 + z 2 dydz 

Jb/2 Jb/2 y ' 

The fact that the last step is a trivial in- 
tegral results from the symmetry of the 



rb/2 
l = 2pb 

Jb/2 


rb/2 

/ Z 

Jb/2 


2 dydz 


2 r b/ 

= 2pb 2 / 

Jb/2 


2 

z 2 6z 




-1* 






= ^-Mb 2 
6 







9.3. POLAR COORDINATES 



129 



9.3 Polar coordinates 




b / Rene 
(1596-1650) 



Descartes 



ure c. In polar coordinates, the dif- 
ferential of area, figure d can be 
written as da = R dR d(j>. The 
idea is that since d_R and dcj) are in- 
finitesimally small, the shaded area 
in the figure is very nearly a rect- 
angle, measuring dR is one dimen- 
sion and R d<j) in the other. (The 
latter follows from the definition of 
radian measure.) 



d^ 



-r\Rdip 



+'' 



Philosopher and mathematician 
Rene Descartes originated the idea 
of describing plane geometry using 
(x, y) coordinates measured from 
a pair of perpendicular coordinate 
axes. These rectangular coordi- 
nates are known as Cartesian co- 
ordinates, in his honor. 



d / The differential of 
area in polar coordinates 



Example 95 
> A disk has mass M and radius b. 
Find its moment of inertia for rota- 
tion about the axis passing perpendic- 
ularly through its center. 




c / Polar coordinates. 



As a logical extension of Descartes' 
idea, one can find different ways of 
defining coordinates on the plane, 
such as the polar coordinates in fig- 



/= / FfdM 



R^da 



da 
M 



R^da 



M 

™ Jr=oJ$i=0 

M 



Ff-Rd<b6R 



r b r-2-n. 

R 3 d<j>dfi 

2M f b q 

■wL* m 

Mb A 
~2~ 
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change the second one to y: 




-3 -2 -1 



1 2 3 



e / The function e 
ample 96. 



, ex- 



Example 96 
In statistics, the standard "bell curve" 
(also known as the normal distribution 
or Gaussian) is shaped like e~ x . An 
area under this curve is proportional 
to the probability that x lies within a 
certain range. To fix the constant of 
proportionality, we need to evaluate 



/: 



dx 



which corresponds to a probability of 
1. As discussed on p. 93, the cor- 
responding indefinite integral can't be 
done in closed form. The definite in- 
tegral from —oo to +oo, however, can 
be evaluated by the following devious 
trick due to Poisson. We first write I 2 
as a product of two copies of the inte- 
gral. 



dx 



e" x dx 



Since the variable of integration x is 
a "dummy" variable, we can choose it 
to be any letter of the alphabet. Let's 



e"*dx 



e ¥ 6y 



This is in principle a pointless and triv- 
ial change, but it suggests visualizing 
the right-hand side in the Cartesian 
plane, and considering it as the inte- 
gral of a single function that depends 
on both x and y: 



f 



oo poo 



oo J — oo 



e~ y e'" ) dxdy 



Switching to polar coordinates, we 
have 



e~ R RdRdfy 



2tt / e~ R RdR 



which can be done using the substitu- 
tion u= R 2 ,6u = 2R6R: 

/"CO 

I 2 = 2n e~ u {6u/2) 
Jo 

= 7t 



9.4 Spherical and 
cylindrical 
coordinates 

In cylindrical coordinates (R, <p, z), 
z measures distance along the axis, 
R measures distance from the axis, 
and (f) is an angle that wraps 
around the axis. 

The differential of volume in cylin- 
drical coordinates can be written 
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idently r m 



zb/h. 



r 



V 



f / Cylindrical coordinates. 

as dv = R dR dzdtft. This fol- 
lows from adding a third dimen- 
sion, along the z axis, to the rect- 
angle in figure d. 

Example 97 

> Show that the expression for dv has 
the right units. 

> Angles are unitless, since the defini- 
tion of radian measure involves a dis- 
tance divided by a distance. There- 
fore the only factors in the expression 
that have units are R, 6R, and dz. If 
these three factors are measured, say, 
in meters, then their product has units 
of cubic meters, which is correct for a 
volume. 



Example 98 

> Find the volume of a cone whose 
height is h and whose base has radius 
b. 

> Let's plan on putting the z integral 
on the outside of the sandwich. That 
means we need to express the radius 
r max of the cone in terms of z. This 
comes out nice and simple if we imag- 
ine the cone upside down, with its tip 
at the origin. Then since we have 
r max {z = 0) = 0, and r max {h) = b, ev- 



dv 



rh rzb/h r2n 

/ / fldcfjdfldz 

rh rzb/h 

= 2n / fldfldz 

Jz=0 J r=0 



= 2tt/ {zb/hY/2 6z 

J z=0 

= n{b/hf / z 2 dz 



nb 2 h 



As a check, we note that the answer 
has units of volume. This is the classi- 
cal result, known by the ancient Egyp- 
tians, that a cone has one third the vol- 
ume of its enclosing cylinder. 

In spherical coordinates (r,0,4>), 
the coordinate r measures the dis- 
tance from the origin, and 9 and <j) 
are analogous to latitude and lon- 
gitude, except that 6 is measured 
down from the pole rather than 
from the equator. 



The differential of volume in 
spherical coordinates is dv = 
r 2 sin 9 dr d6 d(f). 



Example 99 



> Find the volume of a sphere. 
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6 ' 
^/ r 
/ 
/ 
/ 

/ „-- 
/ , ' ' 



If 



g / Spherical coordinates. 



v = I dv 

/ / r 2 sin6d4>drde 

=o Jf=o J 4>=o 

rn rr=b 

2tt / / r 2 sinedrd9 

J 9=0 Jr=0 

2n- ^- I sin dO 
47td 3 
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Problems 

1 Pascal's snail (named after 
Etienne Pascal, father of Blaise 
Pascal) is the shape shown in the 
figure, defined by R = 6(1 + cos 8) 
in polar coordinates. 

(a) Make a rough visual estimate 
of its area from the figure. 

(b) Find its area exactly, and check 
against your result from part a. 

(c) Show that your answer has the 
right units. [Thompson, 1919] 




Problem 1 : Pascal's snail with b = 1 . 



2 A cone with a curved base is 
defined by r < b and 9 < ir/4 in 
spherical coordinates. 

(a) Find its volume. 

(b) Show that your answer has the 
right units. 

3 Find the moment of inertia of 
a sphere for rotation about an axis 
passing through its center. 

4 A jump-rope swinging in circles 
has the shape of a sine function. 



Find the volume enclosed by the 
swinging rope, in terms of the ra- 
dius b of the circle at the rope's 
fattest point, and the straight-line 
distance £ between the ends. 

5 A curvy-sided cone is defined in 
cylindrical coordinates by < z < 
h and R < kz 2 . (a) What units 
are implied for the constant fc? (b) 
Find the volume of the shape, (c) 
Check that your answer to b has 
the right units. 

6 The discovery of nuclear fis- 
sion was originally explained by 
modeling the atomic nucleus as a 
drop of liquid. Like a water bal- 
loon, the drop could spin or vi- 
brate, and if the motion became 
sufficiently violent, the drop could 
split in half — undergo fission. It 
was later learned that even the 
nuclei in matter under ordinary 
conditions are often not spherical 
but deformed, typically with an 
elongated ellipsoidal shape like an 
American football. One simple 
way of describing such a shape is 
with the equation 



r < b[l + c(cos 



k)] 



where c = for a sphere, c > for 
an elongated shape, and c < for 
a flattened one. Usually for nuclei 
in ordinary matter, c ranges from 
about to +0.2. The constant k 
is introduced because without it, a 
change in c would entail not just 
a change in the shape of the nu- 
cleus, but a change in its volume 
as well. Observations show, on the 
contrary, that the nuclear fluid is 
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highly incompressible, just like or- 
dinary water, so the volume of the 
nucleus is not expected to change 
significantly, even in violent pro- 
cesses like fission. Calculate the 
volume of the nucleus, throwing 
away terms of order c 2 or higher, 
and show that k = 1/3 is required 
in order to keep the volume con- 
stant. 

7 This problem is a continua- 
tion of problem 6, and assumes the 
result of that problem is already 
known. The nucleus 168 Er has the 
type of elongated ellipsoidal shape 
described in that problem, with 
c > 0. Its mass is 2.8 x 1CT 25 kg, 
it is observed to have a moment 
of inertia of 2.62 x 10~ 54 kg-m 2 
for end-over-end rotation, and its 
shape is believed to be described 
by b w 6 x KT 15 m and c w 0.2. 
Assuming that it rotated rigidly, 
the usual equation for the moment 
of inertia could be applicable, but 
it may rotate more like a water bal- 
loon, in which case its moment of 
inertia would be significantly less 
because not all the mass would ac- 
tually flow. Test which type of ro- 
tation it is by calculating its mo- 
ment of inertia for end-over-end ro- 
tation and comparing with the ob- 
served moment of inertia. * 

8 Von Karman found empirically 
that when a fluid flows turbulently 
through a cylindrical pipe, the ve- 
locity of flow v varies according 
to the "1/7 power law," v/v = 
(1 — r/R) 1 ' 7 , where v is the veloc- 
ity at the center of the pipe, R is 



the radius of the pipe, and r is the 
distance from the axis. Find the 
average velocity at which water is 
transported through the pipe. 
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Formal definition of the tangent line 

Given a function x(t), consider any point P = (o, x(a)) on its graph. 
Let the function £(t) be a line passing through P. We say that £ cuts 
through x at P if there exists some real number d > such that the 
graph of £ is on one side of the graph of x for all a — d < t < a, and is 
on the other side for all a < t < a + d. 

Definition (Marsden 1 ): A line £ through P is said to be the line tangent 
to x at P if all lines through P with slopes less than that of £ cut 
through x in one direction, while all lines with slopes greater than P's 
cut through it in the opposite direction. 

The reason for the complication in the definition is that there are cases 
in which the function is smooth and well-behaved throughout a certain 
region, but for a certain point P in that region, all lines through P cut 
through P. For example, the function x(t) = t 3 is blessed everywhere 
with lines that don't cut through it — everywhere, that is, except at 
t = 0, which is an inflection point (p. 17). Our definition fills in the 
"gap tooth" in the derivative function in the obvious way. 



1 Calculus Unlimited, by Jerrold Marsden and Alan Weinstein, 
http : //resolver . caltech . edu/CaltechBODK : 1981 . 001 
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Detours 



Derivatives of polynomials 

Some ideas in this proof are due to Michael Livshits. 

We want to prove that the derivative of t is kt , for k = 1, 2, 3, .... 
It suffices to prove that the derivative equals k when evaluated at t = 1, 
since we can then apply the kind of scaling argument 2 used on page 12 
to show that the derivative of i 2 /2 was t. The proposed tangent line at 
(1, 1) has the equation £(t) = k(t — 1) + 1, so what we need to prove is 
that the polynomial t k — [k(t — 1) + 1] is greater than or equal to zero 
throughout some region around t = 1. 

Figure a shows a typical case. The graph of 3(t — 1) + 1 lies entirely 
below the graph of i 3 in a large region. It does pop back up above it at 
t = — 2, but that's far away, and the definition of the tangent line only 
requires that some region around (1, 1) be free of such crossing points. 
In fact, a little experimentation shows that these crossings occur only 
for odd k, and always for t < 0. This suggests that we ought to aim for 
a general proof that there are no crossings for t > 0. 




a / The graphs of f and 
3(f-1) + 1. 



Suppose that such a crossing happens at the point (t, t k ) 
slope of the line £(t) is k, so we must have 



Then the 



t- 1 



k 



The left-hand side is the quotient of two polynomials, and we expect it 
to divide without a remainder, because t k — 1 equals zero at t = 1, and 



2 Scaling fails in the special case of t = and odd k, so we have to fill in the "gap 
tooth" as mentioned in the preceding section. 
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therefore it must have t — 1 as a factor. If we try the example of k = 3, 
we find that the quotient has the very simple form t 2 + t + 1, i.e., a 
polynomial of order k — 1 whose coefficients are all equal to 1. We can 
easily verify that this works for all k > 1, by checking the multiplication 

t k -l = (t-l)(t k - 1 + t k - 2 + ... + l) 

in which all the terms in the expansion of the right-hand side cancel 
except for t k and — 1. Let's refer to the quotient as Q(t) = t k ~ 1 + t k ~ 2 + 
. . . + 1. How can we get Q(t) = k? Clearly we have a solution for t = 1, 
since there are k terms, each equal to 1. For t > 1, all the terms except 
the constant one are greater than 1, so there can't be any solution. For 
< t < 1, all the terms except the constant one are positive and less 
than 1, so again there can't be any solution. This completes the proof 
that there are no crossings for t > 0, which establishes the desired result. 
The result is also true for any real value of k; see example 22 on p. 41. 

Details of the proof of the derivative of the sine function 

Some ideas in this proof are due to Jerome Keisler (see references, p. 
195). 

On page 28, I computed the derivative of sint to be cost as follows: 

da; = sin(i + At) — sinf , 
= sin t cos dt 

+ cos t sin dt — sin t 
= cos t dt + . . . 

We want to prove prove that the error ". . . " introduced by the small- 
angle approximations really is of order dt 2 . 

A quick and dirty way to check whether this is likely to be true is to 
use Inf to calculate sin(t + dt) at some specific value of t. For example, 
at t = 1 we have this result: 

: sin(l+d) 

(0. 84147) + (0. 54030) d 

+ (-0. 42074) d ~2+ (-0 . 09006) d ~3 

+ (0. 03506) d'4 

The small- angle approximations give sin(l + d) « sin 1 + (cos l)d. The 
coefficients of the first two terms of the exact result are, as expected 
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sin(l) = 0.84147 and cos(l) = 0.5403..., so although the small-angle 
approximations have introduced some errors, they involve only higher 
powers of dt, as claimed. 

The demonstration with Inf has two shortcomings. One is that it only 
works for t = 1, but we need to prove that the result for all values 
of t. That doesn't mean that the check for t = 1 was useless. Even 
though a general mathematical statement about all numbers can never 
be proved by demonstrating specific examples for which it succeeds, a 
single counterexample suffices to disprove it. The check for t = 1 was 
worth doing, because if the first term had come out to be 0.88888, it 
would have immediately disproved our claim, thereby saving us from 
wasting hours attempting to prove something that wasn't true. 

The other problem is that I've never explained how Inf calculates this 
kind of thing. The answer is that it uses something called a Taylor 
series, discussed in section 7.4. Using Inf here without knowing yet 
how Taylor series work is like using your calculator as a "black box" 
to extract the square root of v2 without knowing how it does it. Not 
knowing the inner workings of the black box makes the demonstration 
less than satisfying. 

In any case, this preliminary check makes it sound like it's reasonable 
to go on and try to produce a real proof. We have 

sin(t + dt) = sin t + cos tdt — E , 

where the error E introduced by the approximations is 

E = sint(l — cosdi) 
+ cos t(dt — sin dt) 

Let the radius of the circle in figure b be one, so AD is cos dt and CD is 

C E 




b / Geometrical interpre- 
tation of the error term. 

sindt. The area of the shaded pie slice is di/2, and the area of triangle 
ABC is sindt/2, so the error made in the approximation sind£ ?a dt 
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equals twice the area of the dish shape formed by line BC and arc BC. 
Therefore dt — sindt is less than the area of rectangle CEBD. But CEBD 
has both an infinitesimal width and an infinitesimal height, so this error 
is of no more than order dt 2 . 

For the approximation cosdt m 1, the error (represented by BD) is 
1 — cosdt = 1 — v 1 — sin dt, which is less than 1 — VI 
sindt < dt. Therefore this error is of order dt 2 . 



Formal statement of the transfer principle 

On page 33, I gave an informal description of the transfer principle. The 
idea being expressed was that the phrases "for any" and "there exists" 
can only be used in phrases like "for any real number x" and "there 
exists a real number y such that. . . " The transfer principle does not 
apply to statements like "there exists an integer x such that. . . " or 
even "there exists a subset of the real numbers such that. . . " 

The way to state the transfer principle more rigorously is to get rid of 
the ambiguities of the English language by restricting ourselves to a well- 
defined language of mathematical symbols. This language has symbols 
V and 3, meaning "for all" and "there exists," and these are called 
quantifiers. A quantifier is always immediately followed by a variable, 
and then by a statement involving that variable. For example, suppose 
we want to say that a number greater than 1 exists. We can write the 
statement 3x x > 1, read as "there exists a number x such that x is 
greater than 1." We don't actually need to say "there exists a number 
x in the set of real numbers such that . . . ," because our intention here 
is to make statements that can be translated back and forth between 
the reals and the hyperreals. In fact, we forbid this type of explicit 
reference to the domain to which the quantifiers apply. This restriction 
is described technically by saying that we're only allowing first-order 
logic. 

Quantifiers can be nested. For example, I can state the commutativity 
of addition as VaA/y x + y = y + x, and the existence of additive inverses 
as VxBy x + y = 0. 

After the quantifier and the variable, we have some mathematical as- 
sertion, in which we're allowed to use the symbols =, >, x and + for 
the basic operations of arithmetic, and also parentheses and the logical 
operators -i, A and V for "not," "and," and "or." Although we will 
often find it convenient to use other symbols, such as 0, 1, — , /, <, 
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y^, etc., these are not strictly necesary. We use them only as a way of 
making the formulas more readable, with the understanding that they 
could be translated into the more basic symbols. For instance, I can 
restate 3x x > 1 as 3x3y\/z yz = z A x > y. The number y ends up 
just being a name for 1, because it's the only number that will always 
satisfy yz = z. 

Finally, these statements need to satisfy certain syntactic rules. For 
example, we can't have a string of symbols like x + xy, because the 
operators + and x are supposed to have numbers on both sides. 

A finite string of symbols satisfying all the above rules is called a well- 
formed formula (wff) in first-order logic. 

The transfer principle states that a wff is true on the real numbers if 
and only if it is true on the hyperreal numbers. 

If you look in an elementary algebra textbook at the statement of all the 
elementary axioms of the real number system, such as commutativity 
of multiplication, associativity of addition, and so on, you'll see that 
they can all be expressed in terms of first-order logic, and therefore 
you can use them when manipulating hyperreal numbers. However, it's 
not possible to fully characterize the real number system without giving 
at least some further axioms that cannot be expressed in first order. 
There is more than one way to set up these additional axioms, but 
for example one common axiom to use is the Archimedean principle, 
which states that there is no number that is greater than 1, greater 
than 1 + 1, greater than 1 + 1 + 1, and so on. If we try to express 
this as a well-formed formula in first order logic, one attempt would 
be ->3x x > 1 A x > 1 + 1 A x > 1 + 1 + 1 . . ., where the . . . 
indicates that the string of symbols would have to go on forever. This 
doesn't work because a well-formed formula has to be a finite string 
of symbols. Another attempt would be 3x\/n € N x > n, where N 
means the set of integers. This one also fails to be a wff in first-order 
logic, because in first-order logic we're not allowed to explicitly refer 
to the domain of a quantifier. We conclude that the transfer principle 
does not necessarily apply to the Archimedean principle, and in fact 
the Archimedean principle is not true on the hyperreals, because they 
include numbers that are infinite. 

Now that we have a thorough and rigorous understanding of what the 
transfer principle says, the next obvious question is why we should be- 
lieve that it's true. This is discussed in the following section. 
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Is the transfer principle true? 

The preceding section stated the transfer principle in rigorous language. 
But why should we believe that it's true? 

One approach would be to begin deducing things about the hyperreals, 
and see if we can deduce a contradiction. As a starting point, we can 
use the axioms of elementary algebra, because the transfer principle 
tells us that those apply to the hyperreals as well. Since we also assume 
that the Archimedean principle does not hold for the hyperreals, we 
can also base our reasoning on that, and therefore many of the things 
we can prove will be things that are true for the hyperreals, but false 
for the reals. This is essentially what mathematicians started doing 
immediately after Newton and Leibniz invented the calculus, and they 
were immediately successful in producing contradictions. However, they 
weren't using formally defined logical systems, and they hadn't stated 
anything as specific and rigorous as the transfer principle. In particular, 
they didn't understand the need for anything like our restriction of the 
transfer principle to first-order logic. If we could reach a contradiction 
based on the more modern, rigorous statement of the transfer principle, 
that would be a different matter. It would tell us that one of two things 
was true: either (1) the hyperreal number system lacks logical self- 
consistency, or (2) both the hyperreals and the reals lack self-consistency. 

Abraham Robinson proved, however, around 1960 that the reals and the 
hyperreals have the same level of consistency: one is self-consistent if 
and only if the other is. In other words, if the hyperreals harbor a ticking 
logical time bomb, so do the reals. Since most mathematicians don't 
lose much sleep worrying about a lack of self-consistency in the real 
number system, this is generally taken as meaning that infinitesimals 
have been rehabilitated. In fact, it gives them an even higher level 
of respectability than they had in the era of Gauss and Euler, when 
they were widely used, but mathematicians knew a valid style of proof 
involving infinitesimals only because they'd slowly developed the right 
"Spidey sense." 

But how in the world could Robinson have proved such a thing? It seems 
like a daunting task. There is an infinite number of possible logical trains 
of argument in mathematics. How could he have demonstrated, with a 
stroke of a pen, that none of them could ever lead to a contradiction 
(unless it indicated a contradiction lurking in the real number system 
as well)? Obviously it's not possible to check them all explicitly. 

The way modern logicians prove such things is usually by using models. 
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For an easy example of a model, consider Euclidean geometry. Euclid 
believed that the following four postulates 3 were all self-evident: 

1. Let the following be postulated: to draw a straight line from any 
point to any point. 

2. To extend a finite straight line continuously in a straight line. 

3. To describe a circle with any center and radius. 

4. That all right angles are equal to one another. 

These postulates, which today we would call "axioms," played the same 
role with respect to Euclidean geometry that the elementary axioms of 
arithmetic play for the real number system. 

Euclid also found that he needed a fifth postulate in order to prove many 
of his most important theorems, such as the Pythagorean theorem. I'll 
state a different axiom that turns out to be equivalent to it: 

5. Playfair's version of the parallel postulate: Given any infinite 
line L, and any point P not on that line, there exists a unique infinite 
line through P that never crosses L. 

The ancients believed this to be less obviously self-evident than the first 
four, partly because if you were given the two lines, it could theoretically 
take an infinite amount of time to inspect them and verify that they 
never crossed, even at some very distant point. Euclid avoided even 
mentioning infinite lines in postulates 1-4, and he considered postulate 5 
to be so much less intuitively appealing in comparison that he organized 
the Elements so that the first 28 propositions were those that could be 
proved without resorting to it. Continuing the analogy with the reals 
and hyperreals, the parallel postulate plays the role of the Archimedean 
principle: a statement about infinity that we don't feel quite so sure 
about. 

For centuries, geometers tried to prove the parallel postulate from the 
first five. The trouble with this kind of thing was that it could be difficult 
to tell what was a valid proof and what wasn't. The postulates were 
written in an ambiguous human language, not a formal logical system. 
As an example of the kind of confusion that could result, suppose we 
assume the following postulate, 5', in place of 5: 



'modified slightly by me from a translation by T.L. Heath, 1925 
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5': Given any infinite line L, and any point P not on that line, every 
infinite line through P crosses L. 

Postulate 5' plays the role for noneuclidean geometry that the negation 
of the Archimedean principle plays for the hyperreals. It tells us we're 
not in Kansas anymore. If a geometer can start from postulates 1-4 
and 5' and arrive at a contradiction, then he's made significant progress 
toward proving that postulate 5 has to be true based on postulates 1-4. 
(He would also have to disprove another version of the postulate, in 
which there is more than one parallel through P.) For centuries, there 
have been reasonable-sounding arguments that seemed to give such a 
contradiction. For instance, it was proved that a geometry with 5' in it 
was one in which distances were limited to some finite maximum. This 
would appear to contradict postulate 3, since there would be a limit 
on the radius of a circle. But there's plenty of room for disagreement 
here, because the ancient Greeks didn't have any notion of a set of real 
numbers. For them, the thing we would call a number was simply a 
finite straight line (line segment) with a certain length. If postulate 
3 says that we can make a circle given any radius, it's reasonable to 
interpret that as a statement that given any finite straight line as the 
specification of the radius, we can make the circle. There is then no 
contradiction, because the too-long radius can't be specified in the first 
place. This muddle is similar to the kind of confusion that reigned for 
centuries after Newton: did infinitesimals lead to contradictions? 

In the 19th century, Lobachevsky and Bolyai came up with a version of 
Euclid's axioms that was more rigorously defined, and that was care- 
fully engineered to avoid the kinds of contradictions that had previously 
been discovered in noneuclidean geometry. This is analogous to the in- 
vention of the transfer principle and the realization that the restriction 
to first-order logic was necessary. Lobachevsky and Bolyai slaved away 
for year after year proving new results in noneuclidean geometry, won- 
dering whether they would ever reach a contradiction. Eventually they 
started to doubt that there were ever going to be contradictions, and 
finally they proved that the contradictions didn't exist. 

The technique for proving consistency was to make a model of the noneu- 
clidean system. Consider geometry done on the surface of a sphere. The 
word "line" in the axioms now has to be understood as referring to a 
great circle, i.e., one with the same radius as the sphere. The parallel 
postulate fails, because parallels don't exist: every great circle intersects 
every other great circle. One modification has to be made to the model 
in order to make it consistent with the first postulate. The constructions 
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described in Euclid's postulates are tacitly assumed to be unique (and 
in more rigorous formulations are explicitly stated to be so). We want 
there to be a unique line defined by any two distinct points. This works 
fine on the sphere as long as the points aren't too far apart, but it fails if 
the points are antipodes, i.e., they lie at opposite sides of the sphere. For 
example, every line of longitude on the Earth's surface passes through 
both poles. The solution to this problem is to modify what we mean by 
"point." Points at each other's antipodes are considered to be the same 
point. (Or, equivalently, we can do geometry on a hemisphere, but agree 
that when we go off one edge, we "wrap around" to the opposite side.) 

This spherical model obeys all the postulates of this particular system of 
noneuclidean geometry. But consider now that we constructed it inside 
a surrounding three-dimensional space in which the parallel postulate 
does hold. Now suppose we keep on proving theorems in this system 
of noneuclidean geometry, filling up page after page with proofs using 
words like "line," which we mentally associate with great circles on a 
certain sphere — and eventually we reach a contradiction. But now we 
can go back through our proofs, and in every place where the word "line" 
occurs we can cross it out with a red pencil and put in "great circle on 
this particular sphere." It would now be a proof about Euclidean geom- 
etry, and the contradiction would prove that Euclidean geometry lacked 
self-consistency. We therefore arrive at the result that if noneuclidean 
geometry is inconsistent, so is Euclidean geometry. Since nobody be- 
lieves that Euclidean geometry is inconsistent, this is considered the 
moral equivalent of proving noneuclidean geometry to be consistent. 

If you've been keeping the system of analogies in mind as you read this 
story, it should be clear what's coming next. If we want to prove that 
the hyperreals have the same consistency as the reals, we just have to 
construct a model of the hyperreals using the reals. This is done in detail 
elsewhere (see Stroyan and Mathforum.org in the references, p. 195). 
I'll just sketch the general idea. A hyperreal number is represented by 
an infinite sequence of real numbers. For example, the sequence 

7 7 7 7 
1,1,1,1,... 

would be the hyperreal version of the number 7. A sequence like 

1,2,3,... 

represents an infinite number, while 

1 1 
'2'3'"" 
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is infinitesimal. All the arithmetic operations are defined by applying 
them to the corresponding members of the sequences. For example, the 
sum of the 7, 7, 7, ... sequence and the 1, 2, 3, . . . sequence would be 8, 
9, 10, . . . , which we interpret as a somewhat larger infinite number. 

The big problem in this approach is how to compare hyperreals, because 
a comparison like < is supposed to give an answer that is either true or 
false. It's not supposed to give a hyperreal number as the result. 

It's clear that 8, 9, 10, ...is greater than 1, 1, 1, ..., because every 
member of the first sequence is greater than every member of the sec- 
ond one. But is 8, 9, 10, . . . greater than 9, 9, 9, . . . ? We want the 
answer to be "yes," because we're thinking of the first one as an infinite 
number and the second one as the ordinary finite number 9. The first 
sequence is indeed greater than the second at almost every one of the 
infinite number of places at which they could be compared. The only 
place where it loses the contest is at the very first position, and the 
only spot where we get a tie is the second one. Essentially the idea is 
that we want to define a concept of what happens "almost everywhere" 
on some infinite list. If one thing happens in an infinite number of 
places and something else only happens at some finite number of spots, 
then the definition of "almost everywhere" is clear. What's harder is a 
comparison of something like these two sequences: 

2,2,2,2,2,2,2,2,2,2,2,2,2,2,... 

and 

1,3,1,1,3,1,1,1,3,1,1,1,1,3,... 

where the second sequence has longer and longer runs of ones inter- 
spersed between the threes. The two sequences are never equal at any 
position, so clearly they can't be considered to be equal as hyperreal 
numbers. But there is an infinite number of spots in which the first 
sequence is greater than the second, and likewise an infinite number in 
which it's less. It seems as though there are more in which it's greater, 
so we probably want to define the second sequence as being a hyperreal 
number that's less than 2. The problem is that it can be very difficult to 
write down an acceptable definition of this "almost everywhere" notion. 
The answer is very technical, and I won't go into it here, but it can be 
done. Because two sequences could be equal almost everywhere, we end 
up having to define a hyperreal number not as a particular sequence but 
as a set of sequences that are equal to each other almost everywhere. 

With the construction of this model, it is possible to prove that the 
hyperreals have the same level of consistency as the reals. 
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The transfer principle applied to functions 

On page 34, I told you not to worry about whether it was legitimate 
to apply familiar functions like x 2 , *Jx, sin a;, cos a;, and e x to hyperreal 
numbers. But since you're reading this, you're obviously in need of more 
reassurance. 

For some of these functions, the transfer principle straightforwardly 
guarantees that they work for hyperreals, have all the familiar proper- 
ties, and can be computed in the same way. For example, the following 
statement is in a suitable form to have the transfer principle applied to 
it: For any real number x, x ■ x > 0. Changing "real" to "hyperreal," 
we find out that the square of a hyperreal number is greater than or 
equal to zero, just like the square of a real number. Writing it as x 2 
or calling it a square is just a matter of notation and terminology. The 
same applies to this statement: For any real number x > 0, there exists 
a real number y such that y 2 = x. Applying the transfer function to it 
tells us that square roots can be defined for the hyperreals as well. 

There's a problem, however, when we get to functions like sin a; and 
e x . If you look up the definition of the sine function in a trigonometry 
textbook, it will be defined geometrically, as the ratio of the lengths of 
two sides of a certain triangle. The transfer principle doesn't apply to 
geometry, only to arithmetic. It's not even obvious intuitively that it 
makes sense to define a sine function on the hyperreals. In an application 
like the differentiation of the sine function on page 28, we only had to 
take sines of hyperreal numbers that were infinitesimally close to real 
numbers, but if the sine is going to be a full-fledged function defined on 
the hyperreals, then we should be allowed, for example, to take the sine 
of an infinite number. What would that mean? If you take the sine of a 
number like a million or a billion on your calculator, you just get some 
apparently random result between — 1 and 1 . The sine function wiggles 
back and forth indefinitely as x gets bigger and bigger, never settling 
down to any specific limiting value. Apparently we could have sin H = 1 
for a particular infinite H, and then sm(H + ir/2) = 0, sm(H + tt) = —1, 

It turns out that the moral equivalent of the transfer function can indeed 
be applied to any function on the reals, yielding a function that is in 
some sense its natural "big brother" on the the hyperreals, but the 
consequences can be either disturbing or exhilirating depending on your 
tastes. For example, consider the function [x\ that takes a real number 
x and rounds it down to the greatest integer that is less than or equal 
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to to x, e.g., [3] = 3, and [ir] = 3. This function, like any other real 
function, can be extended to the hyperreals, and that means that we 
can define the hyperintegers, the set of hyperreals that satisfy [x] = x. 
The hyperintegers include the integers as a subset, but they also include 
infinite numbers. This is likely to seem magical, or even unreasonable, 
if we come at the hyperreals from a purely axiomatic point of view. The 
extension of functions to the hyperreals seems much more natural in 
view of the construction of the hyperreals in terms of sequences given in 
the preceding section. For example, the sequence 1.3, 2.3, 3.3, 4.3, 5.3, . . . 
represents an infinite number. If we apply the [x] function to it, we get 
1, 2, 3, 4, 5, . . ., which is an infinite integer. 

Proof of the chain rule 

In the statement of the chain rule on page 37, I followed my usual 
custom of writing derivatives as dy/dx, when actually the derivative is 
the standard part, st(dy/d:r). In more rigorous notation, the chain rule 
should be stated like this: 

\dx / \dy / \ax 

The transfer principle allows us to rewrite the left-hand side as 
st[(dz/dy) (dy/dx)], and then we can get the desired result using the 
identity st(a6) = st(a)st(6). 

Derivative of e x 

All of the reasoning on page 38 would have applied equally well to any 
other exponential function with a different base, such as 2 X or 1Q X . 
Those functions would have different values of c, so if we want to deter- 
mine the value of c for the base-e case, we need to bring in the definition 
of e, or of the exponential function e x , somehow. 

We can take the definition of e x to be 

e x = lim ( 1 + - 

The idea behind this relation is similar to the idea of compound interest. 
If the interest rate is 10%, compounded annually, then x = 0.1, and 
the balance grows by a factor (1 + x) = 1.1 in one year. If, instead, 
we want to compound the interest monthly, we can set the monthly 
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interest rate to 0.1/12, and then the growth of the balance over a year 
is (l+x/12) 12 = 1.1047, which is slightly larger because the interest from 
the earlier months itself accrues interest in the later months. Continuing 
this limiting process, we find e = 1.1052. 

If n is large, then we have a good approximation to the base-e ex- 
ponential, so let's differentiate this finite-n approximation and try to 
find an approximation to the derivative of e x . The chain rule tells is 
that the derivative of (1 + x/n) n is the derivative of the raising-to- 
the-nth-power function, multiplied by the derivative of the inside stuff, 
d(l + x/n)/dx = l/n. We then have 



d(l 



dx 



nll + 



1 

n 



But evaluating this at x = simply gives 1, SO at X = 0, the approxi- 
mation to the derivative is exactly 1 for all values of n — it's not even 
necessary to imagine going to larger and larger values of n. This estab- 
lishes that c = 1, so we have 

de x _ x 
dx 

for all values of x. 

Proofs of the generalizations of I'Hopital's rule 

Multiple applications of the rule 

Here we prove, as claimed on p. 65, that the form of L'Hopital's rule 
rule given on p. 60 can be generalized to the case where more than 
one application of the rule is required. The proof requires material 
from ch. 4 (integration and the mean value theorem), and, as discussed 
in example 83 on p. 110, the motivation for the result becomes much 
more transparent once has read ch. 7 and knows about Taylor series. 
The reader who has arrived here while reading ch. 3 will need to defer 
reading this section of the proof until after ch. 4, and may wish to wait 
until after ch. 7. 

The proof can be broken down into two steps. 

Step 1: We first have to establish a stronger form of I'Hopital's rule that 
states that limu/v = limit/ v rather than limu/v = ii/v. This form is 
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stronger, because in a case like example 44 on p. 65, ii/v isn't denned, 
but limit/ v is. 

We prove the stronger form using the mean value theorem (p. 74). For 
simplicity of notation, let's assume that the limit is being taken at x = 0. 
By the fundamental theorem of calculus, we have u(x) = J Q u(x')dx', 
and the mean value theorem then tells us that for some p between and 
x, u(x) = xu{p). Likewise for a q in this interval, v(x) = xv(q). So 

lim — = lim — — — , 

■x^q v p->o v(q) 

but since both p and q are closer to zero than x is, the limit as they 
simultaneously approach zero is the same as the limit as x approaches 



Step 2: If we need to take n derivatives, the proof follows by applying 
the extra-strength rule n times. 4 

Change of variable 

We will build up the rest of the features of l'Hopital's rule using the 
technique of a change of variable. To demonstrate how this works, let's 
imagine that we were starting from an even more stripped-down version 
of l'Hopital's rule than the one on p. 60. Say we only knew how to do 
limits of the form x — > rather than x — > a for an arbitrary real number 
a. We could then evaluate lini^^a u/v simply by defining t = x — a and 
reexpressing u and v in terms of t. 



lim 



Example 100 
sinx 



m X ■ 



I 

> Reduce 

to a form involving a limit at 0. 

4 There is a logical subtlety here, which is that although we've given a clearcut 
recipe for cooking up a proof for any given n, that isn't quite the same thing as 
proving it for any positive integer n. This is an example where what we really need 
is a technique called proof by induction. In general, proof by induction works like 
this. Suppose we prove some statement about the integer 1, e.g., that l'Hopital's 
rule is valid when you take 1 derivative. Now say that we can also prove that if that 
statement holds for a given n, it also holds for n + 1. Proof by induction means that 
we can then consider the statement as having been proved for all positive integers. 
For suppose the contrary. Then there would be some least n for which it failed, but 
this would be a contradiction, since it would hold for n — 1. 
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> Define t = x - n. Solving for x gives x = t + n. We substitute into the above 
expression to find 

sinx ,. sin(f + n) 



»ti X — 7t f->-0 f 

If all we knew was the -> form of I'Hopital's rule, then this would suffice to 
reduce the problem to one we knew how to solve. In fact, this kind of change of 
variable works in all cases, not just for a limit at n, so rather then going through 
a laborious change of variable every time, we could simply establish the more 
general form on p. 60, with ->■ a. 

The indeterminate form oo/oo 

To prove that I'Hopital's rule works in general for oo/oo forms, we do a 
change of variable on the outputs of the functions u and v rather than 
their inputs. Suppose that our original problem is of the form 

lim — , 

v 

where both functions blow up. 5 We then define U = 1/u and V = 1/v. 
We now have 

lim — = lim — ; — = lim — , 

v 1/V U 

and since U and V both approach zero, we have reduced the problem 
to one that can be solved using the version of I'Hopital's rule already 
proved for indeterminate forms like 0/0. Differentiating and applying 
the chain rule, we have 

u ,. V —v~ 2 v 

lim — = hm — = lim ^r— 

v JJ —u l ii 

Since limafe = lim a lim b provided that lima and lim b are both defined, 
we can rearrange factors to produce the desired result. 

This change of variable is a specific example of a much more general 
method of problem-solving in which we look for a way to reduce a hard 
problem to an easier one. We will encounter changes of variable again on 
p. 85 as a technique for integration, which means undoing the operation 
of differentiation. 



Proof of the fundamental theorem of calculus 

There are three parts to the proof: (1) Take the equation that states 
the fundamental theorem, differentiate both sides with respect to b, and 



3 Think about what happens when only u blows up, or only v. 
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show that they're equal. (2) Show that continuous functions with equal 
derivatives must be essentially the same function, except for an additive 
constant. (3) Show that the constant in question is zero. 



1. By the definition of the indefinite integral, the derivative of a; (6)— a; (a) 
with respect to b equals x(b). We have to establish that this equals the 
following: 



dU^ )df = St d6 



st- 



1 

J db 



b+db ,-b 

x(t)dt- I x(t)dt 



b+db 



x(t)dt 



db 



st— km Vid + idb/H) T1 

i=0 
1 H 

st km — ^ x(b + idb/H) 



i=0 



Since x is continuous, all the values of x occurring inside the sum can 
differ only infinitesimally from x(b). Therefore the quantity inside the 
limit differs only infinitesimally from x(b), and the standard part of its 
limit must be x(b). 6 



2. Suppose / and g are two continuous functions whose derivatives are 
equal. Then d = f — g is a continuous function whose derivative is zero. 
But the only continuous function with a derivative of zero is a constant, 
so / and g differ by at most an additive constant. 



3. I've established that the derivatives with respect to b of x(b) — x(a) 
and J xdt are the same, so they differ by at most an additive constant. 
But at b = a, they're both zero, so the constant must be zero. 



6 If you don't want to use infinitesimals, then you can express the derivative as a 
limit, and in the final step of the argument use the mean value theorem, introduced 
later in the chapter. 
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The intermediate value theorem 

On page 54 I asserted that the intermediate value theorem was really 
more a statement about the (real or hyperreal) number system than 
about functions. For insight, consider figure c, which is a geometrical 
construction that constitutes the proof of the very first proposition in 
Euclid's celebrated Elements. The proposition to be proved is that given 
a line segment AB, it is possible to construct an equilateral triangle with 
AB as its base. The proof is by construction; that is, Euclid doesn't 
just give a logical argument that convinces us the triangle must exist, 
he actually demonstrates how to construct it. First we draw a circle 
with center A and radius AB, which his third postulate says we can do. 
Then we draw another circle with the same radius, but centered at B. 
Pick one of the intersections of the circles and call it C. Construct the 
line segments AC and BC (postulate 1). Then AC equals AB by the 
definition of the circle, and likewise BC equals AB. Euclid also has an 
axiom that things equal to the same thing are equal to one another, so 
it follows that AC equals BC, and therefore the triangle is equilateral. 




c / A proof from Euclid's Elements. 

It seems like a model of mathematical rigor, but there's a flaw in the 
reasoning, which is that he assumes without justififcation that the cir- 
cles do have a point in common. To see that this is not as secure an 
assumption as it seems, consider the usual Cartesian representation of 
plane geometry in terms of coordinates (x, y). Usually we assume that x 
and y are real numbers. What if we instead do our Cartesian geometry 
using rational numbers as coordinates? Euclid's five postulates are all 
consistent with this. For example, circles do exist. Let A = (0,0) and 
B = (1,0). Then there are infinitely many pairs of rational numbers in 
the set that satisfies the definition of the circle centered at A. Examples 
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include (3/5,4/5) and (—7/25,24/25). The circle is also continuous in 
the sense that if I specify a point on it such as (—7/25,24/25), and a 
distance that I'm allowed to make as small as I please, say 10~ 6 , then 
other points exist on the circle within that distance of the given point. 
However, the intersection assumed by Euclid's proof doesn't exist. It 
would lie at (1/2, v3/2), but V3 doesn't exist in the rational number 
system. 

In exactly the same way, we can construct counterexamples to the in- 
termediate value theorem if the underlying system of numbers doesn't 
have the same properties as the real numbers. For example, let y = x 2 . 
Then y is a continuous function, on the interval from to 1, but if 
we take the rational numbers as our foundation, then there is no x for 
which y = 1/2. The solution would be x = l/v2, which doesn't exist in 
the rational number system. Notice the similarity between this problem 
and the one in Euclid's proof. In both cases we have curves that cut 
one another without having an intersection. In the present example, the 
curves are the graphs of the functions y = x 2 and y = 1/2. 

The interpretation is that the real numbers are in some sense more 
densely packed than the rationals, and with two thousand years worth of 
hindsight, we can see that Euclid should have included a sixth postulate 
that expressed this density property. One possible way of stating such 
a postulate is the following. Let L be a ray, and O its endpoint. We 
think of O as the origin of the positive number line. Let P and Q be 
sets of points on L such that every point in P is closer to O than every 
point in Q. Then there exists some point Z on L such that Z lies at 
least as far from O as every point in P, but no farther than any point in 
Q. Technically this property is known as completeness. As an example, 
let P = {x\x 2 < 2} and Q = {x\x 2 > 2}. Then the point Z would 
have to be \/2, which shows that the rationals are not complete. The 
reals are complete, and the completeness axiom can serve as one of the 
fundamental axioms of the real numbers. 

Note that the axiom refers to sets P and Q, and says that a certain 
fact is true for any choice of those sets; it therefore isn't the type of 
proposition that is covered by the transfer principle, and in fact it fails 
for the hyperreals, as we can see if P is the set of all infinitesimals and 
Q the positive real numbers. 

Here is a skeletal proof of the intermediate value theorem, in which I'll 
make some simplifying assumptions and leave out some cases. We want 
to prove that if y is a continuous real- valued function on the real interval 
from a to b, and if y takes on values yi and y 2 at certain points within 
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this interval, then for any j/3 between 2/1 and j/2, there is some real x in 
the interval for which y(x) = 2/3. I'll assume the case in which x\ < x^ 
and 2/1 < ?/2- Define sets of real numbers P = {x\y < 2/3}, and let 
Q = {x\y > 2/3}. For simplicity, I'll assume that every member of P is 
less than or equal to every member of Q, which happens, for example, 
if the function y{x) is always increasing on the interval [a,b\. If P and 
Q intersect, then the theorem holds. Suppose instead that P and Q do 
not intersect. Using the completeness axiom, there exists some real x 
which is greater than or equal to every element of P and less than or 
equal to every element of Q. Suppose x belongs to P. Then the following 
statement is in the right form for the transfer principle to apply to it: 
for any number x' > x, y{x') > 1/3. We can conclude that the statement 
is also true for the hyperreals, so that if da; is a positive infinitesimal and 
x' = x + dx, we have y(x) < 2/3, but y(x + dx) > 2/3. Then by continuity, 
y(x) — y(x + dx) is infinitesimal. But y(x) < 2/3 and y(x + dx) > 2/3, so 
the standard part of y(x) must equal 2/3- By assumption y takes on real 
values for real arguments, so y(x) = 2/3. The same reasoning applies if 
x belongs to Q, and since x must belong either to P or to Q, the result 
is proved. 

For an alternative proof of the intermediate value theorem by an entirely 
different technique, see Keisler (references, p. 195). 

As a side issue, we could ask whether there is anything like the interme- 
diate value theorem that can be applied to functions on the hyperreals. 
Our definition of continuity on page 53 explicitly states that it only 
applies to real functions. Even if we could apply the definition to a 
function on the hyperreals, the proof given above would fail, since the 
hyperreals lack the completeness property. As a counterexample, let e 
be some positive infinitesimal, and define a function y such that y = — e 
when st(a;) < and y = e everywhere else. If we insist on applying 
the definition of continuity to this function, it appears to be continuous, 
so it violates the intermediate value theorem. Note, however, that the 
way this function is defined is different from the way we usually define 
functions on the hyperreals. Usually we define a function on the reals, 
say y = x 2 , in language to which the transfer principle applies, and then 
we use the transfer principle to reason about the function's analog on 
the hyperreals. For instance, the function y = x 2 has the property that 
y > everywhere, and the transfer principle guarantees that that's also 
true if we take y = x 2 as the definition of a function on the hyperreals. 
For functions defined in this way, the intermediate value theorem makes 
a statement that the transfer principle applies to, and it is therefore 
true for the hyperreal version of the function as well. 
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The extreme value theorem was stated on page 56. Before we can prove 
it, we need to establish some preliminaries, which turn out to be inter- 
esting for their own sake. 

Definition: Let C be a subset of the real numbers whose definition can 
be expressed in the type of language to which the transfer principle 
applies. Then C is compact if for every hyperreal number x satisfying 
the definition of C, the standard part of x exists and is a member of C. 

To understand the content of this definition, we need to look at the two 
ways in which a set could fail to satisfy it. 

First, suppose U is defined by x > 0. Then there are positive infinite 
hyperreal numbers that satisfy the definition, and their standard part is 
not defined, so U is not compact. The reason U is not compact is that 
it is unbounded. 

Second, let V be defined by < x < 1. Then if dx is a positive infinites- 
imal, 1 — dx satisfies the definition of V, but its standard part is 1, which 
is not in V, so V is not compact. The set V has boundary points at 
and 1, and the reason it is not compact is that it doesn't contain its 
right-hand boundary point. A boundary point is a real number which 
is infinitesimally close to some points inside the set, and also to some 
other points that are on the outside. 

We therefore arrive at the following alternative characterization of the 
notion of a compact set, whose proof is straightforward. 

Theorem: A set is compact if and only if it is bounded and contains all 
of its boundary points. 

Intuitively, the reason compact sets are interesting is that if you're stand- 
ing inside a compact set and start taking steps in a certain direction, 
without ever turning around, you're guaranteed to approach some point 
in the set as a limit. (You might step over some gaps that aren't in- 
cluded in the set.) If the set was unbounded, you could just walk forever 
at a constant speed. If the set didn't contain its boundary point, then 
you could asymptotically approach the boundary, but the goal you were 
approaching wouldn't be a member of the set. 

The following theorem turns out to be the most difficult part of the 
discussion. 

Theorem: A compact set contains its maximum and minimum. 

Proof: Let C be a compact set. We know it's bounded, so let M be the 
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set of all real numbers that are greater than any member of C. By the 
completeness property of the real numbers, there is some real number x 
between C and M. Let *C be the set of hyperreal numbers that satisfies 
the same definition that C does. 

Every real x' greater than x fails to satisfy the condition that defines 
C, and by the transfer principle the same must be true if x' is any 
hyperreal, so if dx is a positive infinitesimal, x + dx must be outside of 
*C. 

But now consider x — dx. The following statement holds for the reals: 
there is no number x' < x that is greater than every member of C '. By 
the transfer principle, we find that there is some hyperreal number q 
in *C that is greater than x — dx. But the standard part of q must 
equal x, for otherwise stq would be a member of C that was greater 
than x. Therefore a; is a boundary point of C, and since C is compact, 
a; is a member of C . We conclude C contains its maximum. A similar 
argument shows that C contains its minimum, so the theorem is proved. 

There were two subtle things about this proof. The first was that we 
ended up constructing the set of hyperreals *C, which was the hyperreal 
"big brother" of the real set C . This is exactly the sort of thing that the 
transfer principle does not guarantee we can do. However, if you look 
back through the proof, you can see that * C is used only as a notational 
convenience. Rather than talking about whether a certain number was a 
member of *C , we could have referred, more cumbersomely, to whether 
or not it satisfied the condition that had originally been used to define 
C . The price we paid for this was a slight loss of generality. There 
are so many different sets of real numbers that they can't possibly all 
have explicit definitions that can be written down on a piece of paper. 
However, there is very little reason to be interested in studying the 
properties of a set that we were never able to define in the first place. 
The other subtlety was that we had to construct the auxiliary point 
x — dx, but there was not much we could actually say about x — dx 
itself. In particular, it might or might not have been a member of C. 
For example, if C is defined by the condition x = 0, then *C likewise 
contains only the single element 0, and x — dx is not a member of *C. 
But if C is defined by < x < 1, then x — dx is a member of *C . 

The original goal was to prove the extreme value theorem, which is a 
statement about continuous functions, but so far we haven't said any- 
thing about functions. 

Lemma: Let / be a real function defined on a set of points C. Let D be 
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the image of C, i.e., the set of all values f(x) that occur for some x in 
C. Then if / is continous and C is compact, D is compact as well. In 
other words, continuous functions take compact sets to compact sets. 
Proof: Let y = f(x) be any hyperreal output corresponding to a hy- 
perreal input x in *C. We need to prove that the standard part of y 
exists, and is a member of D. Since C is compact, the standard part 
of x exists and is a member of C . But then by continuity y differs only 
infinitesimally from f(stx), which is real, so sty = f(stx) is defined and 
is a member of D. 



We are now ready to prove the extreme value theorem, in a version 
slightly more general than the one originally given on page 56. 



The extreme value theorem: Any continuous function on a compact set 
achieves a maximum and minimum value, and does so at specific points 
in the set. 



Proof: Let / be continuous, and let C be the compact set on which 
we seek its maximum and minimum. Then the image D as defined in 
the lemma above is compact. Therefore D contains its maximum and 
minimum values. 



Proof of the mean value theorem 



Suppose that the mean value theorem is violated. Let L be the set of all 
x in the interval from a to 6 such that y{x) < y, and likewise let M be 
the set with y(x) > y. If the theorem is violated, then the union of these 
two sets covers the entire interval from a to b. Neither one can be empty; 
if, for example, M was empty, then we would have y < y everywhere 
and also J y = J y, but it follows directly from the definition of the 
definite integral that when one function is less than another, its integral 
is also less than the other's. Since y takes on values less than and greater 
than y, it follows from the intermediate value theorem that y takes on 
the value y somewhere (intuitively, at a boundary between L and M). 
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Proof of the fundamental theorem of algebra 

We start with the following lemma, which is intuitively obvious, because 
polynomials don't have asymptotes. Its proof is given after the proof of 
the main theorem. 

Lemma: For any polynomial P(z) in the complex plane, its magnitude 
\P (z)\ achieves its minimum value at some specific point z . 

The fundamental theorem of algebra: In the complex number system, a 
nonzero nth-order polynomial has exactly n roots, i.e., it can be factored 
into the form P(z) = (z—ai)(z— 02) ••• (z—a n ), where the at are complex 
numbers. 

Proof: The proofs in the cases of n = and 1 are trivial, so our strategy 
is to reduce higher-n cases to lower ones. If an nth-degree polynomial P 
has at least one root, a, then we can always reduce it to a polynomial of 
degree n — 1 by dividing it by (z — a). Therefore the theorem is proved 
by induction provided that we can show that every polynomial of degree 
greater than zero has at least one root. 

Suppose, on the contrary, that there is an nth order polynomial P{z), 
with n > 0, that has no roots at all. Then by the lemma \P\ achieves 
its minimum value at some point z . To make things more simple and 
concrete, we can construct another polynomial Q(z) = P(z + z )/P(z ), 
so that \Q\ has a minimum value of 1, achieved at Q(0) = 1. This means 
that Q's constant term is 1. What about its other terms? Let Q(z) = 1 + 
c\z + . . . + c n z n . Suppose C\ was nonzero. Then for infinitesimally small 
values of z, the terms of order z 1 and higher would be negligible, and 
we could make Q(z) be a real number less than one by an appropriate 
choice of z's argument. Therefore C\ must be zero. But that means that 
if c 2 is nonzero, then for infinitesimally small z, the z 2 term dominates 
the z 3 and higher terms, and again this would allow us to make Q(z) be 
real and less than one for appropriately chosen values of z. Continuing 
this process, we find that Q(z) has no terms at all beyond the constant 
term, i.e., Q(z) = 1. This contradicts the assumption that n was greater 
than zero, so we've proved by contradiction that there is no P with the 
properties claimed. 

Uninteresting proof of the lemma: Let M(r) be the minimum value of 
|-P(z)| on the disk defined by \z\ < r. We first prove that M(r) can't 
asymptotically approach a minimum as r approaches infinity. Suppose 
to the contrary: for every r, there is some r' > r with M{r') < M(r). 
Then by the transfer principle, the same would have to be true for 
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hyperreal values of r. But it's clear that if r is infinite, the lower-order 
terms of P will be infinitesimally small compared to the highest-order 
term, and therefore M(r) is infinite for infinite values of r, which is 
a contradiction, since by construction M is decreasing, and finite for 
finite r. We can therefore conclude by the extreme value theorem that 
M achieves its minimum for some specific value of r. The least such r 
describes a circle \z\ = r in the complex plane, and the minimum of \P\ 
on this circle must be the same as its global minimum. Applying the 
extreme value function to |-P(z)| as a function of arg z on the interval 
< aigz < 2tt, we establish the desired result. 
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B Answers and solutions 



Answers to Self-Checks 

Answers to self-checks for chapter 4 

page 78, self-check 1: 

The area under the curve from 130 to 135 cm is about 3/4 of a rectangle. 
The area from 135 to 140 cm is about 1.5 rectangles. The number of peo- 
ple in the second range is about twice as much. We could have converted 
these to actual probabilities (1 rectangle = 5 cm x 0.005 cm -1 = 0.025), 
but that would have been pointless, because we were just going to com- 
pare the two areas. 

Answers to self-checks for chapter 6 

page 118, self-check 1: Say we're looking for u = yfz, i.e., we want a 
number u that, multiplied by itself, equals z. Multiplication multiplies 
the magnitudes, so the magnitude of u can be found by taking the square 
root of the magnitude of z. Since multiplication also adds the arguments 
of the numbers, squaring a number doubles its argument. Therefore we 
can simply divide the argument of z by two to find the argument of 
u. This results in one of the square roots of z. There is another one, 
which is —u, since (— u) 2 is the same as u 2 . This may seem a little odd: 
if u was chosen so that doubling its argument gave the argument of z, 
then how can the same be true for — w? Well for example, suppose the 
argument of z is 4°. Then argu = 2°, and arg(— u) = 182°. Doubling 
182 gives 364, which is actually a synonym for 4 degrees. 
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B Answers and solutions 



Solutions to homework problems 



Solutions for chapter 1 



page 21, problem 1: 



The tangent line has to pass through the point (3,9), and it also seems, 
at least approximately, to pass through (1.5,0). This gives it a slope of 
(9 - 0)/(3 - 1.5) = 9/1.5 = 6, and that's exactly what 2t is at t = 3. 




a / Problem 1 . 



page 21, problem 2: 



The tangent line has to pass through the point (0,sin(e )) = (0,0.84), 
and it also seems, at least approximately, to pass through (-1.6,0). This 
gives it a slope of (0.84 - 0)/(0 - (-1.6)) = 0.84/1.6 = 0.53. The more 
accurate result given in the problem can be found using the methods of 
chapter 2. 
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b / Problem 2. 

page 21, problem 3: 

The derivative is a rate of change, so the derivatives of the constants 
1 and 7, which don't change, are clearly zero. The derivative can be 
interpreted geometrically as the slope of the tangent line, and since the 
functions t and It are lines, their derivatives are simply their slopes, 1, 
and 7. All of these could also have been found using the formula that 
says the derivative of t is kt k ~ x , but it wasn't really necessary to get 
that fancy. To find the derivative of t 2 , we can use the formula, which 
gives 2t. One of the properties of the derivative is that multiplying a 
function by a constant multiplies its derivative by the same constant, so 
the derivative of It 1 must be (7)(2t) = 14i. By similar reasoning, the 
derivatives of t 3 and 7t 3 are 3t 2 and 2 It 2 , respectively. 

page 21, problem 4: 

One of the properties of the derivative is that the derivative of a sum is 
the sum of the derivatives, so we can get this by adding up the derivatives 
of 3i 7 , —At 2 , and 6. The derivatives of the three terms are 21i 6 , — 8t, 
and 0, so the derivative of the whole thing is 21t 6 — 8i. 

page 21, problem 5: 

This is exactly like problem 4, except that instead of explicit numerical 
constants like 3 and —4, this problem involves symbolic constants a, b, 
and c. The result is 2at + b. 



page 21, problem 6: 

The first thing that comes to mind is 3t. Its graph would be a line with 
a slope of 3, passing through the origin. Any other line with a slope of 
3 would work too, e.g., 3i + 1. 
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page 21, problem 7: 

Differentiation lowers the power of a monomial by one, so to get some- 
thing with an exponent of 7, we need to differentiate something with an 
exponent of 8. The derivative of t 8 would be 8£ 7 , which is eight times 
too big, so we really need (t 8 /8). As in problem 6, any other function 
that differed by an additive constant would also work, e.g., (t 8 /8) + 1. 

page 21, problem 8: 

This is just like problem 7, but we need something whose derivative 
is three times bigger. Since multiplying by a constant multiplies the 
derivative by the same constant, the way to accomplish this is to take 
the answer to problem 7, and multiply by three. A possible answer is 
(3/8)£ 8 , or that function plus any constant. 

page 21, problem 9: 

This is just a slight generalization of problem 8. Since the derivative 
of a sum is the sum of the derivatives, we just need to handle each 
term individually, and then add up the results. The answer is (3/8)i 8 — 
(4/3)t 3 + 6t, or that function plus any constant. 

page 21, problem 10: 

The function v = (4/3)7r(c£) 3 looks scary and complicated, but it's 
nothing more than a constant multiplied by t 3 , if we rewrite it as v = 
[(4/3)7rc 3 ] t 3 . The whole thing in square brackets is simply one big 
constant, which just comes along for the ride when we differentiate. 
The result is v = [(4/3)7rc 3 ] (3t 2 ), or, simplifying, v = (Anc 3 ) t 2 . (For 
further physical insight, we can factor this as [47r(c£) 2 l c, where ct is the 
radius of the expanding sphere, and the part in brackets is the sphere's 
surface area.) 

For purposes of checking the units, we can ignore the unit- 
less constant An, which just leaves c 3 t 2 . This has units of 
(meters per second) 3 (seconds) 2 , which works out to be cubic meters per 
second. That makes sense, because it tells us how quickly a volume is 
increasing over time. 

page 21, problem 11: 

This is similar to problem 10, in that it looks scary, but we can rewrite 
it as a simple monomial, K = (l/2)mv 2 = (l/2)m(at) 2 = (ma 2 /2)t 2 . 
The derivative is (ma 2 /2)(2t) = ma 2 t. The car needs more and more 
power to accelerate as its speed increases. 
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To check the units, we just need to show that the expression ma 2 t has 
units that are like those of the original expression for K, but divided 
by seconds, since it's a rate of change of K over time. This indeed 
works out, since the only change in the factors that aren't unitless is 
the reduction of the powet of t from 2 to 1. 

page 22, problem 12: 

The area is a = £ 2 = (1 + aT) 2 £ 2 . To make this into something we know 
how to differentiate, we need to square out the expression involving T, 
and make it into something that is expressed explicitly as a polynomial: 

a = £ 2 + 2£ 2 aT + £ 2 a 2 T 2 

Now this is just like problem 5, except that the constants superficially 
look more complicated. The result is 

a = 2£ 2 a + 2£ 2 a 2 T 
= 2£ 2 (a + a 2 T) 

We expect the units of the result to be area per unit temperature, e.g., 
degrees per square meter. This is a little tricky, because we have to 
figure out what units are implied for the constant a. Since the question 
talks about 1 + aT, apparently the quantity uT is unitless. (The 1 is 
unitless, and you can't add things that have different units.) Therefore 
the units of a must be "per degree," or inverse degrees. It wouldn't 
make sense to add a and a 2 T unless they had the same units (and 
you can check for yourself that they do), so the whole thing inside the 
parentheses must have units of inverse degrees. Multiplying by the £ 2 
in front, we have units of area per degree, which is what we expected. 

page 22, problem 13: 

The first derivative is 6t 2 — 1. Going again, the answer is 12t. 

page 22, problem 14: 

The first derivative is 3i 2 + 2t, and the second is 6i+2. Setting this equal 
to zero and solving for t, we find t = —1/3. Looking at the graph, it 
does look like the concavity is down for t < —1/3, and up for t > —1/3. 

page 22, problem 15: 

I chose k = —I, and t = 1. In other words, I'm going to check the slope 
of the function x = t^ 1 = \/r at t = 1, and see whether it really equals 
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0.5 




-♦-t 



c / Problem 14. 

kt k ~ x = — 1. Before even doing the graph, I note that the sign makes 
sense: the function l/t is decreasing for t > 0, so its slope should indeed 
be negative. 




d / Problem 15. 



The tangent line seems to connect the points (0,2) and (2,0), so its slope 
does indeed look like it's — 1. 

The problem asked us to consider the logical meaning of the two pos- 
sible outcomes. If the slope had been significantly different from —1 
given the accuracy of our result, the conclusion would have been that it 
was incorrect to extend the rule to negative values of k. Although our 
example did come out consistent with the rule, that doesn't prove the 
rule in general. An example can disprove a conjecture, but can't prove 
it. Of course, if we tried lots and lots of examples, and they all worked, 
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our confidence in the conjecture would be increased. 

page 22, problem 16: 

A minimum would occur where the derivative was zero. First we rewrite 
the function in a form that we know how to differentiate: 



E(r) = ka 12 r- 12 - 2fca 6 r~ 6 



We're told to have faith that the derivative of t is kt even for k < 0, 

so 

= E 
= -12fca 12 r~ 13 + 12fca 6 r~ 7 

To simplify, we divide both sides by 12fc. The left side was already zero, 
so it keeps being zero. 

= -a 12 r- 13 + fl 6 r- 7 

a 12 r -13 = fl 6 r -7 

a 12 = a 6 r 6 
a 6 = r 6 
r = ±a 

To check that this is a minimum, not a maximum or a point of inflection, 
one method is to construct a graph. The constants a and k are irrelevant 
to this issue. Changing a just rescales the horizontal r axis, and changing 
k does the same for the vertical E axis. That means we can arbitrarily 
set a = 1 and k = 1, and construct the graph shown in the figure. The 
points r = ±a are now simply r = ±1. From the graph, we can see 
that they're clearly minima. Physically, the minimum at r = — a can 
be interpreted as the same physical configuration of the molecule, but 
with the positions of the atoms reversed. It makes sense that r = —a 
behaves the same as r = a, since physically the behavior of the system 
has to be symmetric, regardless of whether we view it from in front or 
from behind. 



The other method of checking that r = a is a minimum is to take the 
second derivative. As before, the values of a and k are irrelevant, and 
can be set to 1. We then have 

E = -12r~ 13 + 12r~ 7 
E = 156r~ 14 - 84r~ 8 
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E 



\r 



-1.3 -0^3 



e / Problem 16. 

Plugging in r = ±1, we get a positive result, which confirms that the 
concavity is upward. 

page 22, problem 17: 

Since polynomials don't have kinks or endpoints in their graphs, the 
maxima and minima must be points where the derivative is zero. Dif- 
ferentiation bumps down all the powers of a polynomial by one, so the 
derivative of a third-order polynomial is a second-order polynomial. A 
second-order polynomial can have at most two real roots (values of t for 
which it equals zero) , which are given by the quadratic formula. (If the 
number inside the square root in the quadratic formula is zero or nega- 
tive, there could be less than two real roots.) That means a third-order 
polynomial can have at most two maxima or minima. 

page 22, problem 18: 

Considering V as a function of h, with b treated as a constant, we have 
for the slope of its graph 



V 



ev_ 
eh 



e v 



V-e h 
\be h 



page 22, problem 19: 

Thinking of the rocket's height as a function of time, we can see that 
goal is to measure the function at its maximum. The derivative is zero 
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at the maximum, so the error incurred due to timing is approximately 
zero. She should not worry about the timing error too much. Other 
factors are likely to be more important, e.g., the rocket may not rise 
exactly vertically above the launchpad. 

page 23, problem 20: If x = n 2 , and i is a polynomial in n, then 
we must have x{n) = x{n) — x(n — 1) = n 2 . If a; is a polynomial of 
order k, then x{n) and x(n — 1) both have n k terms with coefficients 
of 1, so i has no n k term. We want x to have a nonvanishing n 2 
term, so we must have k > 3. For k > 3, it's easy to show that the 
n 3 term in x{n) — x(n — 1) is nonzero, so we must have k = 3. Let 
x(n) = an 3 + bn 2 + . . ., where a is the coefficient that we want to prove 
is 1/3, and . . . represents lower-order terms. By the binomial theorem, 
we have x(n — 1) = an 3 — 3an 2 + bn 2 + . . ., and subtracting this from 
x(n) gives x{n) = 3an 3 + . . .. Since 3a = 1, we have a = 1/3. 

Solutions for chapter 2 

page 46, problem 1: 

dx _ (t + dt) 4 - t 4 
di ~ ch 

_ 4t 3 dt + 6t 2 dt 2 + At dt 3 + dt 4 

~ dt 

= 4t 3 + . . . 

where . . . indicates infinitesimal terms. The derivative is the standard 
part of this, which is 4t 3 . 

page 46, problem 2: 

dx cos(i + dt) — cos t 
~dt ~ dt 

The identity cos(a + (3) = cos a cos f3 — sin a sin /3 then gives 

da; cos t cos dt — sin t sin di — cos t 



dt dt 

The small-angle approximations cos dt w 1 and sin dt sa dt result in 

da; — sin t dt 



dt dt 

= — sin t 
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page 46, problem 3: 



H y/H+l-y/H-1 

1000 .032 

1000,000 0.0010 

1000, 000, 000 0.00032 

The result is getting smaller and smaller, so it seems reasonable to guess 
that if H is infinite, the expression gives an infinitesimal result. 

page 46, problem 4: 

da; y/dx 

.1 .32 

.001 .032 

.00001 .0032 

The square root is getting smaller, but is not getting smaller as fast as 
the number itself. In proportion to the original number, the square root 
is actually getting bigger. It looks like vdx is infinitesimal, but it's still 
infinitely big compared to da;. This makes sense, because vda; equals 
da; 1 ' 2 , we already knew that da; , which equals 1, was infinitely big 
compared to da; 1 , which equals dec. In the hierarchy of infinitesimals, 
da; 1 ' 2 fits in between da; and da; 1 . 

page 46, problem 5: 

Statements (a)-(d), and (f)-(g) are all valid for the hyperreals, because 
they meet the test of being directly translatable, without having to 
interpret the meaning of things like particular subsets of the reals in the 
context of the hyperreals. 

Statement (e), however, refers to the rational numbers, a particular 
subset of the reals, and that means that it can't be mindlessly translated 
into a statement about the hyperreals, unless we had figured out a way 
to translate the set of rational numbers into some corresponding subset 
of the hyperreal numbers like the hyperrationals! This is not the type of 
statement that the transfer principle deals with. The statement is not 
true if we try to change "real" to "hyperreal" while leaving "rational" 
alone; for example, it's not true that there's a rational number that lies 
between the hyperreal numbers and + da;, where da; is infinitesimal. 

page 46, problem 6: If R\ is finite and i?2 infinite, then I/R2 is 
infinitesimal, I/-R1 + I/-R2 differs infinitesimally from l/i?i, and the 
combined resistance R differs infinitesimally from Ri. Physically, the 
second pipe is blocked or too thin to carry any significant flow, so it's 
as though it weren't present. 
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If Ri is finite and R 2 is infinitesimal, then I/R2 is infinite, l/i?i + I/-R2 
is also infinite, and the combined resistance R is infinitesimal. It's so 
easy for water to flow through R 2 that R\ might as well not be present. 
In the context of electrical circuits rather than water pipes, this is known 
as a short circuit. 

page 47, problem 7: The velocity addition is only interesting if the 
infinitesimal velocities u and v are comparable to one another, i.e., their 
ratio is finite. Let's write e for the size of these infinitesimals, so that 
both u and v can be written as e multiplied by some finite number. 
Then 1 + uv differs from 1 by an amount that is on the order of e 2 , 
which is infinitesimally small compared to e. The same then holds true 
for 1/(1 + uv) as well. The result of velocity addition (u + v)/(l + uv) 
is then u + v + . . ., where . . . represents quantities of order e 3 , which 
are amount to a correction that is infinitesimally small compared to the 
nonrelativistic result u + v. 

page 47, problem 8: This would be a horrible problem if we had to 
expand this as a polynomial with 101 terms, as in chapter 1! But now 
we know the chain rule, so it's easy. The derivative is 

[I00(2x + 3)"] [2] 

where the first factor in brackets is the derivative of the function on 
the outside, and the second one is the derivative of the "inside stuff." 
Simplifying a little, the answer is 200(2a; + 3)". 

page 47, problem 9: 

Applying the product rule, we get 

(3 + i)99( x + 2) 200 + (x + l) lm (x + 2) 199 

(The chain rule was also required, but in a trivial way — for both of 
the factors, the derivative of the "inside stuff" was one.) 

page 47, problem 10: 

The derivative of e 7x is e 7x ■ 7, where the first factor is the derivative of 
the outside stuff (the derivative of a base-e exponential is just the same 
thing) , and the second factor is the derivative of the inside stuff. This 
would normally be written as 7e 7x . 

The derivative of the second function is e e e x , with the second expo- 
nential factor coming from the chain rule. 

page 47, problem 11: 

We need to put together three different ideas here: (1) When a function 
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to be differentiated is multiplied by a constant, the constant just comes 
along for the ride. (2) The derivative of the sine is the cosine. (3) We 
need to use the chain rule. The result is —abcos(bx + c). 

page 47, problem 13: 

If we just "wanted to fine the integral of sin x, the answer would be — cos x 
(or —cos a; plus an arbitrary constant), since the derivative would be 

— (—sin a;), which would take us back to the original function. The 
obvious thing to guess for the integral of a sm(bx + c) would therefore 
be — acos(6a; + c), which almost works, but not quite. The derivative of 
this function would be absm(bx + c), with the pesky factor of b coming 
from the chain rule. Therefore what we really wanted was the function 

— (a/6) cos(bx + c). 

page 47, problem 14: 

The chain rule gives 

A((* 2 ) 2 ) 2 =2((x 2 ) 2 )(2(* 2 ))(2*)=8* 7 } 

ax 

which is the same as the result we would have gotten by differentiating 

x s . 

page 47, problem 15: 

To find a maximum, we take the derivative and set it equal to zero. The 
whole factor of 2v 2 / g in front is just one big constant, so it conies along 
for the ride. To differentiate the factor of sin 9 cos 9, we need to use 
the chain rule, plus the fact that the derivative of sin is cos, and the 
derivative of cos is — sin. 

2v 2 

= (cos6» cos(9 + sin6>(-sin#)) 

9 

= cos 2 9 - sin 2 9 
cos 9 = ± sin 9 

We're interested in angles between, and 90 degrees, for which both 
the sine and the cosine are positive, so 

cos 9 = sin 9 

tan 9 = 1 
9 = 45° 

To check that this is really a maximum, not a minimum or an inflection 
point, we could resort to the second derivative test, but we know the 



173 



graph of R(0) is zero at 8 = and 6 = 90°, and positive in between, so 
this must be a maximum. 

page 47, problem 16: 

Taking the derivative and setting it equal to zero, we have 
(e x — e~ x ) /2 = 0, so e x = e~ x , which occurs only at X = 0. The 
second derivative is (e x + e~ x ) /2 (the same as the original function), 
which is positive for all x, so the function is everywhere concave up, and 
this is a minimum. 

page 47, problem 17: 

There are no kinks, endpoints, etc., so extrema will occur only in places 
where the derivative is zero. Applying the chain rule, we find the deriva- 
tive to be cos(sin(sina;)) cos(sin:r) cosx. This will be zero if any of the 
three factors is zero. We have cosw = only when \u\ > 7r/2, and ir/2 
is greater than 1, so it's not possible for either of the first two factors 
to equal zero. The derivative will therefore equal zero if and only if 
cos a; = 0, which happens in the same places where the derivative of 
sin a; is zero, at x = ir/2 + irn, where n is an integer. 




f / Problem 17. 

This essentially completes the required demonstration, but there is one 
more technical issue, which is that it's conceivable that some of these 
could be points of inflection. Constructing a graph of sin(sin(sincc)) 
gives us the necessary insight to see that this can't be the case. The 
function essentially looks like the sine function, but its extrema have 
been "shaved down" a little, giving them slightly flatter tips that don't 
quite extend out to ±1. It's therefore fairly clear that these aren't points 
of inflection. To prove this more rigorously, we could take the second 
derivative and show that it was nonzero at the places where the first 
derivative is zero. That would be messy. A less tedious argument is 
as follows. We can tell from its formula that the function is periodic, 
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i.e., it has the property that f(x + £) = f(x), for £ = 2ir. This follows 
because the innermost sine function is periodic, and the outer layers 
only depend on the result of the inner layer. Therefore all the points of 
the form ir/2 + 27m have the same behavior. Either they're all maxima 
or they're all points of inflection. But clearly a function can't oscillate 
back and forth without having any maxima at all, so they must all be 
maxima. A similar argument applies to the minima. 

page 48, problem 18: 

(a) As suggested, let c = y/g/A, so that d = A In cosh ct = 
Ahi(e ct + e~ ct ). Applying the chain rule, the velocity is 

ct —ct 

. ce — ce 
A- 



cosh ct 



(b) The expression can be rewritten as Actanhrf. 

(c) For large t, the e~ ct terms become negligible, so the velocity is 
Ace ct J e ct = Ac. (d) From the original expression, A must have units of 
distance, since the logarithm is unitless. Also, since ct occurs inside a 
function, ct must be unitless, which means that c has units of inverse 
time. The answers to parts b and c get their units from the factors of 
Ac, which have units of distance multiplied by inverse time, or velocity. 

page 48, problem 19: 

Since I've advocated not memorizing the quotient rule, I'll do this one 
from first principles, using the product rule. 

d 
— tan# 

d0 

d / sin t 

~ &e \cos( 

= — sin 9 (cos 8) 
dO L 

= cos (cos dy 1 + (sin 9) (- l)(cos 6>)~ 2 (- sin 9) 
= 1 + tan 2 9 

(Using a trig identity, this can also be rewritten as sec 2 9.) 

page 48, problem 20: 

Reexpressing ^fx as x 1 ' 3 , the derivative is (l/3)x~ 2 ' 3 . 

page 48, problem 21: 

(a) Using the chain rule, the derivative of (x 2 + l) 1 ' 2 is (1/2) (x 2 + 
l)- 1 / 2 (2x) = x(x 2 + l)- 1 / 2 . 

(b) This is the same as a, except that the 1 is replaced with an a 2 , so 
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the answer is x(x 2 + a 2 ) -1 ' 2 . The idea would be that a has the same 
units as x. 

(c) This can be rewritten as (a+x)^ 1 ' 2 , giving a derivative of (— l/2)(o+ 

xY*i 2 . 

(d) This is similar to c, but we pick up a factor of —2x from the chain 
rule, making the result ax{a — x 2 )~^' 2 . 

page 48, problem 22: 

By the chain rule, the result is 2/(2t + 1). 

page 48, problem 23: 

Using the product rule, we have 

— 3 I sin x + 3 — sin x 
ax J \dx 

but the derivative of a constant is zero, so the first term goes away, and 
we get 3 cos x , which is what we would have had just from the usual 
method of treating multiplicative constants. 

page 48, problem 24: 



N(Gamma(2)) 

1 
N(Gamma(2. 00001)) 

1 . 0000042278 
N( (1.0000042278-D/C. 00001) ) 

0.4227799998 

Probably only the first few digits of this are reliable. 

page 49, problem 25: 

The area and volume are 

A = 2irrl + 2irr 2 
and 

V = 7rr 2 £ 

The strategy is to use the equation for A, which is a constant, to elimi- 
nate the variable £ , and then maximize V in terms of r. 

I = (A - 2irr 2 )/2irr 
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Substituting this expression for £ back into the equation for V, 

1 * 

V = -rA - 7rr J 
2 

To maximize this with respect to r, we take the derivative and set it 
equal to zero. 

= -A- 3irr 2 
2 

A = 67T7- 2 

£ = (67T7- 2 - 27rr 2 )/27rr 

£ = 2r 



In other words, the length should be the same as the diameter. 

page 49, problem 26: 

(a) We can break the expression down into three factors: the constant 
m/2 in front, the nonrelativistic velocity dependence v 2 , and the rela- 
tivistic correction factor (1 — v 2 /c 2 )" 1 ' 2 . Rather than substituting in 
at for v, it's a little less messy to calculate dK/dt = (dK/dv)(dv/dt) = 
adK/dv. Using the product rule, we have 



dK 
"dT 



1 

a ■ -m 
2 



1/2 
-3/2 



2vll 

+- 2 -(-2-)(i-^r /z (-i?) 

/ 2\-l/2 



ma 2 t 



2c 2 



-3/2' 



(b) The expression ma 2 t is the nonrelativistic (classical) result, and has 
the correct units of kinetic energy divided by time. The factor in square 
brackets is the relativistic correction, which is unitless. 

(c) As v gets closer and closer to c, the expression 1 — v 2 /c 2 approaches 
zero, so both the terms in the relativistic correction blow up to positive 
infinity. 

page 49, problem 27: 

We already know it works for positive x, so we only need to check it 
for negative x. For negative values of x, the chain rule tells us that the 
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derivative is l/|cc|, multiplied by —1, since d|x|/dx = — 1. This gives 
— l/|x|, which is the same as 1/x, since x is assumed negative. 

page 49, problem 28: 

Since f(x) = f(-x), 

df(x) = df(-x) 
dx dx 

But by the chain rule, the right-hand side equals —f'(x), as claimed. 

page 49, problem 30: 

Let / = dx /dx be the unknown function. Then 

da; 
da; 

= fx- k+1 +x k {-k + l)x~ k 

where we can use the ordinary rule for derivatives of powers on x , 

since — k + 1 is positive. Solving for /, we have the desired result. 

page 49, problem 31: Since the parallel postulate can be expressed 
in terms of algebra through Cartesian geometry, the transfer principle 
tells us that it holds for F as well. But G is defined in terms of the 
finite hyperreals, so statements about E don't carry over to statements 
about G simply by replacing "real" with "hyperreal," and the transfer 
principle does not guarantee that the parallel postulate applies to G. 

In fact, it is easy to find a counterexample in G. Let e be an infinitesimal 
number. Consider the lines with equations y = 1 and y = 1+ex. Neither 
of these intersects the x axis. 

No, it is not valid to associate only E with the plane described by Eu- 
clid's axioms. All of Euclid's axioms hold equally well in F. F is referred 
to as a nonstandard model of Euclid's axioms. It has the same relation 
to standard Euclidean geometry as the hyperreals have to the reals. If 
we want to make up a set of axioms that describes E and can't describe 
F, then we need to add an additional axiom to Euclid's set. An exam- 
ple of such an axiom would be an axiom stating that given any two line 
segments with lengths l\ and £2, there exists some integer n such that 
n(.\ > £i- Note that although this axiom holds in E, the transfer prin- 
ciple cannot be used to show that it holds in F — it is false in F. The 
transfer principle doesn't apply because the transfer principle doesn't 
apply to statements that include phrases such as "for any integer." 
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page 50, problem 32: 

The normal definition of a repeating decimal such as 0.999 ... is that it 
is the limit of the sequence 0.9, 0.99, . . ., and the limit is a real number, 
by definition. 0.999 . . . equals 1. However, there is an intuition that the 
limiting process 0.9, 0.99, . . . "never quite gets there." This intuition 
can, in fact, be formalized in the construction described beginning on 
page 141; we can define a hyperreal number based on the sequence 
0.9, 0.99, . . ., and it is a number infinitesimally less than one. This is 
not, however, the normal way of defining the symbol 0.999 . . ., and we 
probably wouldn't want to change the definition so that it was. If it 
was, then 0.333 . . . would not equal 1/3. 

page 50, problem 33: 

Converting these into Leibniz notation, we find 

d/ = dg 
da; dh 



and 



d/ = dg 
da; dh 



h 



To prove something is not true in general, it suffices to find one coun- 
terexample. Suppose that g and h are both unitless, and x has units 
of seconds. The value of / is defined by the output of g, so / must 
also be unitless. Since / is unitless, d//dx has units of inverse seconds 
("per second"). But this doesn't match the units of either of the pro- 
posed expressions, because they're both unitless. The correct chain rule, 
however, works. In the equation 

d/ _ d 5 dh 
dx dh dx 

the right-hand side consists of a unitless factor multiplied by a factor 
with units of inverse seconds, so its units are inverse seconds, matching 
the left-hand side. 

page 50, problem 34: 

We can make life a lot easier by observing that the function s(f) will 
be maximized when the expression inside the square root is minimized. 
Also, since / is squared every time it occurs, we can change to a variable 
x = / 2 , and then once the optimal value of x is found we can take its 
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square root in order to find the optimal /. The function to be optimized 
is then 

a(x - f f + bx 

Differentiating this and setting the derivative equal to zero, we find 

2a(x - f ) + b = 
which results in x = /„ — b/2a, or 



/ = V/o 2 - W2a , 

(choosing the positive root, since / represents a frequencies, and fre- 
quencies are positive by definition). Note that the quantity inside the 
square root involves the square of a frequency, but then we take its 
square root, so the units of the result turn out to be frequency, which 
makes sense. We can see that if b is small, the second term is small, and 
the maximum occurs very nearly at f . 

There is one subtle issue that was glossed over above, which is that 
the graph on page 50 shows two extrema: a minimum at / = and a 
maximum at / > 0. What happened to the / = minimum? The issue 
is that I was a little sloppy with the change of variables. Let / stand 
for the quantity inside the square root in the original expression for s. 
Then by the chain rule, 

ds ds dl dx 
df ~ dl ' ~±c ' d/' 

We looked for the place where dl/dx was zero, but ds/df could also be 
zero if one of the other factors was zero. This is what happens at / = 0, 
where dx/df = 0. 

page 50, problem 35: 



1 _ 1 
1 1 



1 



f ' 1 1 + dx/f 
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Applying the geometric series 1/(1 + r) = 1 + r + r 2 + . . ., 

,-/(i-(i- 7 

= p 

dx 

As checks on our result, we note that the units work out correctly (me- 
ters squared divided by meters give meters) , and that the result is indeed 
large, since we divide by the small quantity dx. 

page 51, problem 36: One way to evaluate an expression like a b is by 
using the identity a b = e blna . If we try to substitute o = 1 and b = oo, 
we get e 00 ' , which has an indeterminate form inside the exponential. 
One way to express the idea is that if there is even the tiniest error in 
the value of a, the value of a°° can have any positive value. 

Solutions for chapter 3 

page 67, problem 1: 

(a) The Weierstrass definition requires that if we're given a particular e, 
and we be able to find a S so small that f(x) + g(x) differs from F + G 
by at most e for \x — a\ < 5. But the Weierstrass definition also tells us 
that given e/2, we can find a 5 such that / differs from F by at most 
e/2, and likewise for g and G. The amount by which f + g differs from 
F + G is then at most e/2 + e/2, which completes the proof. 

(b) Let da; be infinitesimal. Then the definition of the limit in terms of 
infinitesimals says that the standard part of /(a + da;) differs at most 
infinitesimally from F, and likewise for g and G. This means that f + g 
differs from F + G by the sum of two infinitesimals, which is itself an 
infinitesimal, and therefore the standard part of f+g evaluated at x+dx 
equals F + G, satisfying the definition. 

page 67, problem 2: 

The shape of the graph can be found by considering four cases: large 
negative x, small negative x, small positive x, and large positive x. In 
these four cases, the function is respectively close to 1, large, small, and 
close to 1. 



The four limits correspond to the four cases described above. 
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-3 -2 -1 



1 2 3 



g / Problem 2. 

page 67, problem 3: All five of these can be done using l'Hopital's 
rule: 



lim — 

S->1 s 


- 1 , 3s 

= lim — 

- 1 1 


lim — 
$-y0 


- COB 6 

e 2 


lim 


5x 
lim — 


2 -2x 

X 

n(n + 1) 


lim 


i^oo [n 


+ 2)(n + 


3) 


ai 


2 + 62; + 


c 



sin# 
~20~ 
10a;- 



lim 



cos 



1 



lim ■ 



nr 



>oo cte 2 + ex + / 



lim 



n + . . 
2ax + . . 
2rfx + . . 



lim 



lim ■ 



2n + ... 

2n + ... 

2a a 

2d ~ d 



lim ■ 



In examples 2, 4, and 5, we differentiate more than once in order to 
get an expression that can be evaluated by substitution. In 4 and 5, 
. . . represents terms that we anticipate will go away after the second 
differentiation. Most people probably would not bother with l'Hopital's 
rule for 3, 4, or 5, being content merely to observe the behavior of the 
highest-order term, which makes the limiting behavior obvious. Exam- 
ples 3, 4, and 5 can also be done rigorously without l'Hopit rule, by 
algebraic manipulation; we divide on the top and bottom by the highest 
power of the variable, giving an expression that is no longer an indeter- 
minate form 00/00. 

page 67, problem 4: 

Both numerator and denominator go to zero, so we can apply l'Hopital's 
rule. Differentiating top and bottom gives (cosx — xsinx)/(— In 2 • 2 X ), 
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which equals —l/ln2 at X = 0. To check this numerically, we plug 
x = 10 -3 into the original expression. The result is —1.44219, which is 
very close to -l/ln2 = -1.44269 . . .. 

page 67, problem 5: 

L'Hopital's rule only works when both the numerator and the denomi- 
nator go to zero. 

page 67, problem 6: Applying l'Hopital's rule once gives 

2u 
lim 



u— >o e — e 

which is still an indeterminate form. Applying the rule a second time, 
we get 

2 

hm = 1 

m^o e u + e~ u 

As a numerical check, plugging u = 0.01 into the original expression 
results in 0.9999917. 

page 67, problem 7: L'Hopital's rule gives cosi/1 — > — 1. Plugging in 
t = 3.1 gives -0.9997. 

page 67, problem 8: Let u = l/x. Then 

df/dx _ df/du 
dg/dx dg/du 

simply by algebraic manipulation of the infinitesimals. (If we want to 
interpret these quantities as derivatives, then our notational convention 
is that they stand for the standard parts of the quotients of the infinites- 
imals, in which case the equality is only for the standard parts.) This 
equality holds not just in the limit but everywhere that the functions 
are differentiable. The expression on the left is the thing whose limit 
we're trying to prove equals lim//g. The right-hand side is equal to 
lim//<7 by the previously established form of l'Hopital's rule. 

page 67, problem 9: By the definition of continuity in terms of in- 
finitesimals, the function is continuous, because an infinitesimal change 
da; leads to a change dy = adx in the output of the function which is 
likewise infinitesimals. (This depends on the fact that a is assumed to 
be real, which implies that it is finite.) 

Continuity in terms of the Weierstrass limit holds because we can take 

S = e/a. 
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Solutions for chapter 4 



page 81, problem 1: 



a : = 


0; 






b : = 


i; 






H : = 


1000; 






dt : = 


= (b-a)/H; 






sum : 


= 0; 






t : = 


a; 






While 


(t<=b) [ 






sum := N(sum+Exp(x 


~2) 


*dt); 


t : 


= N(t+dt); 






]; 








Echo( 


sum) ; 







The result is 1.46. 




h / Problem 2. 



page 81, problem 2: 

The derivative of the cosine is minus the sine, so to get a function whose 
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o 



derivative is the sine, we need minus the cosine. 

sin x Ax 

= (-cosa;)!^ 

= (— cos 2tt) — (— cos 0) 
= (-!)-(-!) 
= 



As shown in figure h, the graph has equal amounts of area above and 
below the x axis. The area below the axis counts as negative area, so 
the total is zero. 

page 81, problem 3: 




i / Problem 3. 

The rectangular area of the graph is 2, and the area under the curve 
fills a little more than half of that, so let's guess 1.4. 



-x +2x 



-x a +x' 

("8/3 + 4) -(0) 
4/3 



This is roughly what we were expecting from our visual estimate. 
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page 81, problem 4: 

Over this interval, the value of the sin function varies from to 1 , and 
it spends more time above 1/2 than below it, so we expect the average 
to be somewhat greater than 1/2. The exact result is 

sin x dx 



= - (-cosa;)|o 

7T 

= — [— COS 7T — (— COS 0)] 

7T 

_ 2 

7T 

which is, as expected, somewhat more than 1/2. 

page 81, problem 5: 

Consider a function y{x) defined on the interval from x = to 2 like 
this: 

_ (-1 if < x < 1 
^[l ifl<a;<2 

The mean value of y is zero, but y never equals zero. 

page 81, problem 6: 

Let x be defined as 



if t < 

1 if t > 



x(t) 
Integrating this function up to t gives 

(o ifi<0 
x(t) = { 

J [t ifi>0 

The derivative of x at t = is undefined, and therefore integration 
followed by differentiation doesn't recover the original function x. 

page 82, problem 10: The claim is false for indefinite integrals, since 
indefinite integrals can have a constant of integration. So, for example, 
a possible indefinite integral of x 2 is x 3 /3 + 7, which is neither even nor 
odd. The fundamental theorem doesn't even refer to indefinite integrals, 
which are simply defined through inverse differentiation. 

Let's fix the claim by changing g to a definite integral, g(x) = L f(u)du. 
The claim is now true. However, the proof still doesn't quite work. 
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We've established that all odd functions have even derivatives, but we 
haven't ruled out possibilities such as functions that are neither even 
nor odd, but that have even derivatives. 

Solutions for chapter 5 

page 97, problem 15: 

It's pretty trivial to generalize from e to b. If we write b x as e , then 
we can substitute u = xhib and reduce the 6^ e case to 6 = e. 

The generalization of the exponent of x from 2 to a is less straightfor- 
ward. To do it with a = 2, we needed two integrations by parts, so 
clearly if we wanted to do a case with a = 37, we could do it with 37 
integrations by parts. However, we would have no easy way to write 
down the complete answer without going through the whole tedious 
calculation. Furthermore, this is only going to work if a is a positive 
integer. 

page 97, problem 17: The obvious substitution is u = x p , which leads 
to the form J e u u 1 ' p ~ 1 du. If the exponent 1/p — 1 equals a nonnegative 
integer n, then through n integrations by parts, we can reduce this to 
the form J e x dx. This requires p = 1, 1/2, 1/3, . . . 

Solutions for chapter 6 

page 102, problem 4: 

The method of finding the indefinite integral is discussed in ex- 
ample 67 on p. 89 and problem 15 on p. 97. The result is 
— (ln2) -3 e~ M (— u 2 — 2u + 2), where u = —a; In 2. Plugging in the limits 
of integration, we obtain 2(ln2) . 

Solutions for chapter 7 

page 112, problem 1: 

We can define the sequence f(n) as converging to £ if the following is 
true: for any real number e, there exists an integer ./V such that for all n 
greater than N , the value of / lies within the range from £ — e to £ + e. 

page 112, problem 2: 

(a) The convergence of the series is defined in terms of the convergence 
of its partial sums, which are 1, 0, 1, 0, . . .In the notation used in the 
definition given in the solution to problem 1 above, suppose we pick 
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e = 1/4. Then there is clearly no way to choose any numbers (. and N 
that would satisfy the definition, for regardless of N, I would have to 
be both greater than 3/4 and less than 1/4 in order to agree with the 
zeroes and ones that occur beyond the ./Vth member of the sequence. 

(b) As remarked on page 104, the axioms of the real number system, 
such as associativity, only deal with finite sums, not infinite ones. To see 
that absurd conclusions result from attempting to apply them to infinite 
sums, consider that by the same type of argument we could group the 
sum as 1 + (—1 + 1) + (—1 + 1) + . . ., which would equal 1. 

page 112, problem 3: 

The quantity x n can be reexpressed as e™ ln£C , where In a; is negative 
by hypothesis. The integral of this exponential with respect to n is a 
similar exponential with a constant factor in front, and this converges 
as n approaches infinity. 

page 112, problem 4: 

(a) Applying the integral test, we find that the integral of 1/x 2 is — 1/x, 
which converges as x approaches infinity, so the series converges as well. 

(b) This is an alternating series whose terms approach zero, so it con- 
verges. However, the terms get small extremely slowly, so an extraor- 
dinarily large number of terms would be required in order to get any 
kind of decent approximation to the sum. In fact, it is impossible to 
carry out a straightforward numerical evaluation of this sum because 
it would require such an enormous number of terms that the rounding 
errors would overwhelm the result. 

(c) This converges by the ratio test, because the ratio of successive terms 
approaches 0. 

(d) Split the sum into two sums, one for the 1103 term and one for 
the 26390fc. The ratio of the two factorials is always less than 4 4,c , 
so discarding constant factors, the first sum is less than a geometric 
series with x = (4/396) 4 < 1, and must therefore converge. The second 
sum is less than a series of the form kx k . This one also converges, by 
the integral test. (It has to be integrated with respect to k, not x, 
and the integration can be done by parts.) Since both separate sums 
converge, the entire sum converges. This bizarre-looking expression was 
formulated and shown to equal 1/n by the self-taught genius Srinivasa 
Ramanujan (1887-1920). 

page 112, problem 5: E.g., ^2 n=0 smn diverges, but the ratio test 
won't establish that, because the limit linin^oo | sin(n + l)/sin(n)| does 
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not exist. 

page 114, problem 14: The nth term a n can be rewritten as 2/[n(n-\ 
1)], and using partial fractions this can be changed into 2/n — 2/(n + 1) 
Let the partial sums be s n = Y2\ a n . For insight, let's write out S3: 

(2 2\ (2 2\ (2 2 
S3= U-2J + ( V 2-3J + U"4 

This is called a telescoping series. The second part of one term cancels 
out with the first part of the next. Therefore we have 

2 2 
S3 = 1 " 4 ' 

and in general 

2 2 



1 n+1 
Letting n — > 00, we find that the series sums to 2. 

page 114, problem 17: Yes, it converges. To see this, consider that its 
graph consists of a series of peaks and valleys, each of which is narrower 
than the last and therefore has less area. In fact, the width of these 
humps approaches zero, so that the area approaches zero. This means 
that the integral can be represented as a decreasing, alternating series 
that approaches zero, which must converge. 

page 113, problem 13: There are certainly some special values of x 
for which it does converge, such as and w. For a general value of x, 
however, things become more complicated. Let the nth term be given 
by the function t(n). \t\ converges to a limit, since the first application 
of the sine function brings us into the range < \t\ < 1, and from 
then on, |i| is decreasing and bounded below by 0. It can't approach a 
nonzero limit, for given such a limit t* , there would always be values of 
t slightly greater than t* such that sini was less than t* . Therefore the 
terms in the sum approach zero. This is necessary but not sufficient for 
the series to converge. 

Once t gets small enough, we can approximate the sine using a Taylor 
series. Approximating the discrete function t by a continuous one, we 
have dt/dn ss — (l/6)t 3 , which can be rewritten as t~ 3 dt w — (l/6)dn. 
This is known as separation of variables. Integrating, we find that at 
large values of n, where the constant of integration becomes negligible, 
t « ±\/3/n. The sum diverges by the integral test. Therefore the sum 
diverges for all values of x except for multiples of 71", which cause t to hit 
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zero immediately without passing through the region where the Taylor 
series is a good approximation. 

page 115, problem 20: Our first impression is that it must converge, 
since the 2~™ factor shrinks much more rapidly than the n 2 factor. 
To prove this rigorously, we can apply the integral test. The relevant 
improper integral was carried out in problem 4 on p. 102. 

Finding the sum is far more difficult, and there is no obvious technique 
that is guaranteed to work. However, the integral test suggests an ap- 
proach that does lead to a solution. The fact that the indefinite integral 
can be evaluated suggests that perhaps the partial sum 



Sn=Yi 2 T 



can also be evaluated. Furthermore, the fact that the integral was of 
the form 2~ x P(x), for some polynomial x, suggests that perhaps S n is 
of the same form. Based on this conjecture, we try to determine the 



^n ^n — 1 — n Z 

n 2 2- n = 2~ n [-an 2 + (4a - b)n - 2a + 26 - c] 

Solving for a, b, and c results in P(n) = —n 2 — 4n — 6. This gives the 
correct value for the difference S n — 5„_i, but doesn't give S„ = as 
it should. But this is easy to fix simply by changing the form of our 
conjectured partial sum slightly to S n = 2~ n P{n) + k, where k = 6. 
Evaluating lim.n_s.co S n , we get 6. 

page 115, problem 21: The function cos 2 averages to 1/2, so we 
might naively expect that cos" would average to about 2~"' 2 , in which 
case the sum would converge for any value of p whatsoever. But the 
average is misleading, because there are some "lucky" values of n for 
which cos 2 n « 1, and these will have a disproportionate effect on the 
sum. We know by the integral test that J_ V n diverges, but ^ 1/n 2 
converges, so clearly if p > 2, then even these occasional "lucky" terms 
will not cause divergence. 

What about p = 1? Suppose we have some value of n for which cos 2 n = 
1 — e, where e is some small number. If this is to happen, then we 
must have n = kir + 5, where k is an integer and 6 is small, so that 
cos 2 n« 1 — 5 2 , i.e., e ~ 5 2 . This occurs with a probability proportional 
to 6, and the resulting contribution to the sum is about (1 — S 2 ) n /n, 
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which by the binomial theorem is roughly of order of 1/n if n5 2 ~ 1. 
This happens with probability ~ nT 1 ' 2 , so the expected value of the 
nth term is ~ n~ 3 ' 2 . Since J^ n~ 3 ' 2 converges by the integral test, this 
suggests, but does not prove rigorously, that we also get convergence for 
p = 1. 

A similar argument suggests that the sum diverges for p = 0. 
Answers to self-checks for chapter 9 

page 124, problem 9: First we rewrite the integrand as 



-(e*+e 


~ ix ) (e< 


Hx + 2 - 


-2ix\ 


= I ( e3M 


+ e~ 3ix + e ix 
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e 


tegral is 










_L ( P 3ix 
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,ix 



Evaluating this at gives 0, while at 7r/2 we find 1/3. The result is 1/3. 
page 124, problem 8: 

sin(a + b) = (V< a + 6 ) - e - l(a+b A 111 

= (e ia e ib - e- ia e- ib ) /2i 

= [(cos a + i sin a) (cos b + i sin b) — (cos a — i sin a) (cos b — i sin b)\ jli 
= [(cos a + i sin a) (cos b + i sin b) — (cos a — i sin a) (cos b — i sin b)] jli 
= cos a sin b + sin a cos b 

By a similar computation, we find cos(a + b) = cos a cos b — sin a sin 6. 

page 124, problem 10: If z 3 = 1, then we know that \z\ = 1, since 
cubing z cubes its magnitude. Cubing z triples its argument, so the 
argument of z must be a number that, when tripled, is equivalent to an 
angle of zero. There are three possibilities: 0x3 = 0, (27r/3) x 3 = 2ir, 
and (47r/3)x3 = Air. (Other possibilities, such as (327r/3), are equivalent 
to one of these.) The solutions are: 

z = l, e 2 ™ /3 , e 4 ™ /3 

page 124, problem 11: We can think of this as a polynomial in i or a 
polynomial in y — their roles are symmetric. Let's call x the variable. 
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By the fundamental theorem of algebra, it must be possible to factor it 
into a product of three linear factors, if the coefficients are allowed to 
be complex. Each of these factors causes the product to be zero for a 
certain value of x. But the condition for the expression to be zero is 
x 3 = y 3 , which basically means that the ratio of x to y must be a third 
root of 1. The problem, then, boils down to finding the three third roots 
of 1, as in problem 10. Using the result of that problem, we find that 
there are zeroes when xjy equals 1, e 2m ' 3 , and e 47 ™' 3 . This tells us that 
the factorization is (x — y)(x — e 27rl ' 3 y)(x — e 47Tl ' 3 y). 

The second part of the problem asks us to factorize as much as possible 
using real coefficients. Our only hope of doing this is to multiply out 
the two factors that involve complex coefficients, and see if they produce 
something real. In fact, we can anticipate that it will work, because the 
coefficients are complex conjugates of one another, and when a quadratic 
has two complex roots, they are conjugates. The result is (x — y)(x 2 + 
xy + y 2 ). 

page 124, problem 14: (a) Let m = 10, 000. We know that integrals 
of this form can be done, at least in theory, using partial fractions. 
The ten thousand roots of the polynomial will be ten thousand points 
evenly spaced around the unit circle in the complex plane. They can 
be expressed as rj, = e 27rk / m for k = to m — 1. Since all the roots 
are unequal, the partial-fraction form of the integrand contains only 
terms of the form A^j{x — r/j). Integrating, we would get a sum of ten 
thousand terms of the form Ak In (a; — r^). 

(b) I tried inputting the integral into three different pieces of symbolic 
math software: the open-source packages Yacas and Maxima, and the 
web-based interface to Wolfram's proprietary Mathematica software at 
integrals.com. Maxima gave a partially integrated result after a couple 
of minutes of computation. Yacas crashed. Mathematical web interface 
timed out and suggested buying a stand-alone copy of Mathematica. All 
three programs probably embarked on the computation of the A^ by 
attempting to solve 10,000 equations in the 10,000 unknowns A^, and 
then ran out of resources (either memory or CPU time). 

(c) The expressions look nicer if we let u> = e 27r ' m , so that r^ = ui k . The 
residue method gives 



— = £■ 



x m_l ^(i- w i) raw l!(«i-l) 

Integration gives 

a™ i 



J x m - 1 ^ ma;' c ( m " 1 ) v 



X — U! 
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(Thanks to math.stackexchage.com user zulon for suggesting the 
residue mathod, and to Robert Israel for pointing out that for 
\x\ < 1 this can also be expressed as a hypergeometric function: 
(-z) 2 Fi(^ l;l + i;<).) 
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D References and 
Further Reading 



Further Reading 

The amount of high-quality material on elementary calculus available 
for free online these days is an embarrassment of riches, so most of my 
suggestions for reading are online. I'll refer to books in this section only 
by the surname of the first author; the references section below tells you 
where to find the book online or in print. 

The reader who wants to learn more about the hyperreal system might 
want to start with Stroyan and the Mathforum.org article. For more 
depth, one could next read the relevant parts of Keisler. The standard 
(difficult) treatise on the subject is Robinson. 

Given sufficient ingenuity, it's possible to develop a surprisingly large 
amount of the machinery of calculus without using limits or infinitesi- 
mals. Two examples of such treatments that are freely available online 
are Marsden and Livshits. Marsden gives a geometrical definition of the 
derivative similar to the one used in ch. 1 of this book, but in my opin- 
ion his efforts to develop a sufficient body of techniques without limits 
or infinitesimals end up bogging down in complicated formulations that 
have the same flavor as the Weierstrass definition of the limit and are 
just as complicated. Livshits treats differentiation of rational functions 
as division of functions. 

Tall gives an interesting construction of a number system that is smaller 
than the hyperreals, but easier to construct explicitly, and sufficient to 
handle calculus involving analytic functions. 
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E.1 Review 

Algebra 

Quadratic equation: 

The solutions of ax 2 + bx + c = 

-6±\A 2 -4ac 

are x — ^5 

Logarithms and exponentials: 



Trigonometry with a right 
triangle 



h = hypotenuse 




= opposite 

side 



a = adjacent side 

sin (9 = o/h cos 8 — a/h tan# = of a 
Pythagorean theorem: h 2 = a 2 + o 2 



ln(afe) = In a + In I 



a + b a b 

e — e e 



1 X \l\ X 

me = e — x 
ln(a ) = b In a 



Trigonometry with any triangle 



Geometry, area, and volume 




area of a triangle of 


= \bh 




base b and height h 




Tunci 


circumference of a 


= 2ixr 




circle of radius r 






area of a circle of ra- 


— nr 


sinhrc 


dius r 






surface area of a 


— 4irr 2 




sphere of radius r 






volume of a sphere of 


= i- 3 


tanhx 


radius r 







Law of Sines: 

sin a sin /3 sin 7 
~A~ = ~B~ = ~C~ 

Law of Cosines: 

C 2 = A 2 + B 2 - 2AB cos 7 

E.2 Hyperbolic 



2 

e x + e~ 
2~~ 
sinhx 
coshx 
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E.3 CalCUlUS Table of integrals 

Let / and g be functions of x, and let 

c be a constant. f „, , 1 



Linearity of the derivative: 



X dX = X + C, 771 7= — 1 

771+1 

dx ,ii 
— = In x + c 
x 



d , „ d/ 
■j-( c f) = c -r > . 

dx dx / sin x dx = — cos x + c 



-(f-\-g\ = -J- -| " / cos x dx = sin x + c 



dx ' dx dx 

Rules for differentiation 

The chain rule: 
d 



e x dx — e x + c 

In x dx = x In x — x + c 

dx _! 

tan x + c 



l + x : 



,f(g(x)) = f'(g(x))g'(x) r dx 

a2; / = sin x + c 



5x^ - 


£«+^ 


dx Vsy 


9 g 2 



., Vl-x 2 
Derivatives of products and quo- 
tients: / coshx dx = sinhx + c 



sinh x dx — cosh x + c 

tan x dx — — In | cos x| + c 

cot x dx = In | sin x| + c 

sec x dx = In I sec x + tan x| + c 



sec x dx = tan x + c 



Integral calculus 

The fundamental theorem of calculus 
d/ 



dx — J j csc x dx — — cot x + c 



Linearity of the integral: 

c/(x)dx = c I /(x)dx 



[f(x)+g(x)]= J f(x)dx+l g(x)dx 
Integration by parts: 

fdg = fg- I gdf 
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Archimedean principle, 140 

arctangent, 86 
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in Cartesian coordinates, 125 
in polar coordinates, 129 

argument, 118 

average, 74 

Basel problem, 114 
Berkeley, George, 30 
boundary point, 155 

calculus 

differential, 13 

fundamental theorem of 
proof, 150 
statement, 72 

integral, 13 
Cartesian coordinates, 129 
chain rule, 37 
change of variables, 85 
chromatic scale, 113 
compact set, 155 
completeness, 153 
complex number, 117 

argument of, 118 

conjugate of, 118 

magnitude of, 118 
composition, 53 
concavity, 16 
conjugate, 118 
continuous function, 53 
coordinates 

Cartesian, 129 

cylindrical, 130 

polar, 129 

spherical, 131 
cosine 

derivative of, 29 
cylindrical coordinates, 130 



derivative 

chain rule, 37 

defined using a limit, 31, 45 

defined using infinitesimals, 34 

definition using tangent line, 12 

of a polynomial, 14, 136 

of a quotient, 41 

of a second-order polynomial, 14 

of square root, 36 

of the cosine, 29 

of the exponential, 38, 147 

of the logarithm, 40 

of the sine, 28, 137 

product rule, 35 

properties of, 13 

second, 15 

undefined, 18 
Descartes, Rene, 129 
differentiation 

computer-aided, 42 
numerical, 44 
symbolic, 42 

implicit, 84 

errors 

propagation of, 18 
Euclid, 103 
Euler, 114 
Euler's formula, 120 
Euler, Leonhard, 121 
exponential 

definition of, 147 

derivative of, 38 
extreme value theorem, 56 

proof, 155 

factorial, 9, 108 

fission, 133 

fundamental theorem of algebra 

proof, 158 

statement, 120 
fundamental theorem of calculus 
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proof, 150 
statement, 72 

Galileo, 11 

Gauss, Carl Friedrich, 7 

portrait, 7 
geometric series, 29, 103 

halo, 33 

Holditch's theorem, 82 
hyperbolic cosine, 47 
hyperbolic tangent, 48 
hyperinteger, 147 
hyperreal number, 31 

imaginary number, 117 
implicit differentiation, 84 
improper integral, 99 
indeterminate form, 63 
Inf (calculator), 27 
infinitesimal number, 25 
criticism of, 30 
safe use of, 30 
infinity, 25 
inflection point, 17 
integral, 13 
definite 

definition, 72 
improper, 99 
indefinite 

definition, 71 
iterated, 125 
properties of, 73 
integral test, 105 
integration 

computer-aided 
numerical, 71 
symbolic, 43 
methods of 
by parts, 87 
change of variable, 85 
partial fractions, 89, 122 
substitution, 85 
intermediate value theorem, 54, 152 
iterated integral, 125 

Kepler, Johannes, 83 



l'Hopital's rule 

general form, 65 
proofs, 148 

simplest form, 60 
Leibniz notation 

derivative, 26 

infinitesimal, 26 

integral, 71 
Leibniz, Gottfried, 25 
limit, 31 

definition 

infinitesimals, 58 
Weierstrass, 58 
liquid drop model, 133 
logarithm 

definition of, 40 

magnitude of a complex number, 118 
maximum of a function, 17 
mean value theorem 

proof, 157 

statement, 74 
minimum of a function, 17 
model, 141 
moment of inertia, 127 

Newton's method, 83 
Newton, Isaac, 10 
normalization, 75 
nucleus, 133 

partial fractions, 89, 122 
residue method, 92 
periodic function, 173 
planets, motion of, 83 
polar coordinates, 129 
probability, 75 
product rule, 35 
propagation of errors, 18 

quantifier, 139 
quotient 

derivative of, 41 

radius of convergence, 109 
ratio test, 105 
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residue method, 92 
Robinson, Abraham, 31 
Rolle's theorem, 75 

sequence, 103 
series 

geometric, 29, 103 

infinite, 103 

Taylor, 106 

telescoping, 188 
series, infinite, 107 
sine 

derivative of, 28 
Sophomore's dream, 113 
spherical coordinates, 131 
standard deviation, 79 
standard part, 34 
substitution, 85 
synthetic division, 29 

tangent line 

formal definition, 135 

informal definition, 12 
Taylor series, 106 
telescoping series, 188 
transfer principle, 32 

applied to functions, 146 

volume 

in cylindrical coordinates, 130 
in spherical coordinates, 131 

well-formed formula, 140 

work, 75 

Zeno's paradox, 103 



